Difference between revisions of "Watanabe et al., EMNLP 2007. Online Large-Margin Training for Statistical Machine Translation"
(→Method) |
|||
Line 9: | Line 9: | ||
== Method == | == Method == | ||
− | The paper presents a method to estimate a large number of parameters | + | The paper presents a method to estimate a large number of parameters, of the order of millions, using an online training algorithm for machine translation. The algorithm used in this work is the [Online : Margin Infused Relaxed Algorithm] (MIRA) which has been successfully employed for many structured natural language processing tasks such as, dependency parsing, joint-labeling/chunking. This method is applied to an enhanced hierarchical phrase-based machine translation model. |
=== Hierarchical Phrase-based SMT === | === Hierarchical Phrase-based SMT === | ||
Line 22: | Line 22: | ||
=== Features === | === Features === | ||
+ | The authors build an enhanced translation model on top of the baseline hierarchical phrase-based model. They introduction a very large number of binary features based on word alignments, dependency structures and context. | ||
==== Baseline ==== | ==== Baseline ==== | ||
Line 42: | Line 43: | ||
* Target bigram features of words | * Target bigram features of words | ||
* Hierarchical features to capture dependencies between parent and child words on source and target sides. | * Hierarchical features to capture dependencies between parent and child words on source and target sides. | ||
+ | |||
+ | The authors also perform various kinds of normalization to make the feature set more generalized. | ||
== Experiments and Results == | == Experiments and Results == | ||
== Related Papers == | == Related Papers == |
Revision as of 01:30, 24 September 2011
Contents
Citation
Taro Watanabe, Jun Suzuki, Hajime Tsukada, Hideki Isozaki. 2007. Online large-margin training for statistical machine translation. In Proceedings of EMNLP-CoNLL. pp 764–773
Online Version
Online large-margin training for statistical machine translation
Summary
This paper basically introduces an online discriminative large-margin training approach to statistical machine translation. The authors achieved the then state of the art performance on an Arabic-English translation task by tuning a combination of millions of features in an MT system. By following this approach the authors also addressed the problem of scaling machine translation systems with a large number of features of the order of millions.
Method
The paper presents a method to estimate a large number of parameters, of the order of millions, using an online training algorithm for machine translation. The algorithm used in this work is the [Online : Margin Infused Relaxed Algorithm] (MIRA) which has been successfully employed for many structured natural language processing tasks such as, dependency parsing, joint-labeling/chunking. This method is applied to an enhanced hierarchical phrase-based machine translation model.
Hierarchical Phrase-based SMT
Chiang (2005) introduced the hierarchical phrase-based translation approach, in which non-terminals are embedded in each phrase. A translation is generated by hierarchically combining phrases using the non-terminals. Such a quasi-syntactic structure can naturally capture the reordering of phrases that is not directly modeled by a conventional phrase-based approach.
Each production rule in the hierarchical phrase-based translation model is given by:
where X is a non-terminal, is a source side string of arbitrary terminals and/or non-terminals. is a corresponding target side where is a string of terminals, or a phrase, and is a (possibly empty) string of non-terminals. defines one-to-one mapping between non-terminals in and .
Features
The authors build an enhanced translation model on top of the baseline hierarchical phrase-based model. They introduction a very large number of binary features based on word alignments, dependency structures and context.
Baseline
Please refer Chiang (2005) for baseline features.
Sparse Features
Sparse features are of the form:
These features are categorized as:
- Word pair features using word alignments within a standard phrase pair
- Insertion features to take care of spurious words on the target side.
- Target bigram features of words
- Hierarchical features to capture dependencies between parent and child words on source and target sides.
The authors also perform various kinds of normalization to make the feature set more generalized.