Difference between revisions of "Marcus and Wong, EMNLP 2002"

From Cohen Courses
Jump to navigationJump to search
Line 12: Line 12:
 
== Model ==
 
== Model ==
  
The phrase-to-phrase alignment model presented in this work is built upon the work in ([http://www.isi.edu/~marcu/papers/jointmt2002.pdf Marcus and Wong, 2002]). In this work, words are clustered into phrases by a generative process, which constructs an ordered set of phrases <math>t_{1:m}</math> in the target language, an ordered set of phrases <math>s_{1:n}</math> in the source language and the alignments between phrases <math>a=\{(j,k)\}</math>, which indicates that the phrase pair with the target <math>t_j</math> and <math>s_k</math>. The process is composed by 2 steps:
+
In this work, words are clustered into phrases by a generative process, which constructs an ordered set of phrases <math>t_{1:m}</math> in the target language, an ordered set of phrases <math>s_{1:n}</math> in the source language and the alignments between phrases <math>a=\{(j,k)\}</math>, which indicates that the phrase pair with the target <math>t_j</math> and <math>s_k</math>. The process is composed by 2 steps:
  
 
* First, the number of components <math>l</math> is chosen and each of <math>l</math> phrase pairs are generated independently.
 
* First, the number of components <math>l</math> is chosen and each of <math>l</math> phrase pairs are generated independently.

Revision as of 12:19, 27 September 2011

Citation

Marcu, D., & Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In In Proceedings of EMNLP, pp. 133–139.

Online version

ACM

Summary

This work presents a phrase-to-phrase alignment model for Statistical Machine Translation.

Model

In this work, words are clustered into phrases by a generative process, which constructs an ordered set of phrases in the target language, an ordered set of phrases in the source language and the alignments between phrases , which indicates that the phrase pair with the target and . The process is composed by 2 steps:

  • First, the number of components is chosen and each of phrase pairs are generated independently.
  • Then, a ordering for the phrases in the source phrases is chosen, and all the source and target phrases are aligned one to one.

The choice of is parametrized using a geometric distribution , with the stop parameter :

Phrase pairs are drawn from an unknown multinomial distribution .

A simple position based distortion model is used, where:

Finally, the joint probability model for aligning sentences consisting of phrase pairs is given by:

In the experiments paramters and were set to 0.1 and 0.85, respectively.