Difference between revisions of "DeNero et al, EMNLP 2008"
Line 18: | Line 18: | ||
Most alignment models can not model many-to-many alignments, since they restrict each word in the target sentence to be aligned with at most one word in the source language. Thus, these models can only model one-to-many alignments, where each source word can be aligned to multiple target words but not the opposite. [[Symmetrization]] is a heuristic algorithm that produces many-to-many alignments by combining an one-to-many alignment from source sentences to target sentences and an one-to-many alignment from target sentences to source sentences. | Most alignment models can not model many-to-many alignments, since they restrict each word in the target sentence to be aligned with at most one word in the source language. Thus, these models can only model one-to-many alignments, where each source word can be aligned to multiple target words but not the opposite. [[Symmetrization]] is a heuristic algorithm that produces many-to-many alignments by combining an one-to-many alignment from source sentences to target sentences and an one-to-many alignment from target sentences to source sentences. | ||
− | These alignments are used in the [[Phrase Extraction Algorithm]], where phrase pairs are extracted based on heuristics, such as the alignment template defined by ([[ | + | These alignments are used in the [[Phrase Extraction Algorithm]], where phrase pairs are extracted based on heuristics, such as the alignment template defined by ([[www.ldc.upenn.edu/acl/w/w99/w99-0604.pdf]] | Och et al ). |
== Algorithm == | == Algorithm == |
Revision as of 14:11, 25 September 2011
Citation
Denero, J., Bouchard-ct, R., & Klein, D.(2008). Sampling alignment structure under a Bayesian translation model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 314-323.
Online version
Summary
Unlike word-to-phrase alignments, computing the alignment expectations for phrase-to-phrase alignments is generally intractable due to the exponential growth of possible combination of phrases and alignments. Because of this, previous attempts for building a joint phrase alignment model have been unsuccessful.
This paper describes the first tractable phrase-to-phrase Alignment Model, which relies on Gibbs Sampling to tackle the intractability problem.
Tests show translation improvements over Machine Translation Systems build using conventional methods.
Previous Work
Most alignment models can not model many-to-many alignments, since they restrict each word in the target sentence to be aligned with at most one word in the source language. Thus, these models can only model one-to-many alignments, where each source word can be aligned to multiple target words but not the opposite. Symmetrization is a heuristic algorithm that produces many-to-many alignments by combining an one-to-many alignment from source sentences to target sentences and an one-to-many alignment from target sentences to source sentences.
These alignments are used in the Phrase Extraction Algorithm, where phrase pairs are extracted based on heuristics, such as the alignment template defined by (www.ldc.upenn.edu/acl/w/w99/w99-0604.pdf | Och et al ).