Marcus and Wong, EMNLP 2002

Citation

Marcu, D., & Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In In Proceedings of EMNLP, pp. 133–139.

Online version

ACM

Summary

This work presents a phrase-to-phrase alignment model for Statistical Machine Translation.

Model

In this work, words are clustered into phrases by a generative process, which constructs an ordered set of phrases $t_{1:m}$ in the target language, an ordered set of phrases $s_{1:n}$ in the source language and the alignments between phrases $a=\{(j,k)\}$ , which indicates that the phrase pair with the target $t_{j}$ and $s_{k}$ . The process is composed by 2 steps:

First, the number of components $l$ is chosen and each of $l$ phrase pairs are generated independently.
Then, a ordering for the phrases in the source phrases is chosen, and all the source and target phrases are aligned one to one.