Birch et al, StatMT 2006
Citation
Alexandra Birch, Chris Callison-Burch, and Miles Osborne. 2006. Constraining the phrase-based, joint probability statistical translation model. In The Conference for the Association for Machine Translation in the Americas.
Online version
Summary
The model proposed in Marcus and Wong, EMNLP 2002 provides a strong framework for phrase-to-phrase alignments, but its applicability is hamstrung by the computational complexity of the running EM in the large space of latent variables generated from all possible phrases and alignments.
This work describes a phrase-to-phrase alignment model, uses word-to-phrase alignments to constrain the space of phrasal alignments, improving the scalability of the model and also the performance in the Machine Translation task.
Description of the method
The joint model proposed in Marcus and Wong, EMNLP 2002 searches the space of all possible latent variables (phrases and alignments between phrases) during the EM algorithm, which is computationally expensive. The goal of this method is to define hard constraints on the possible latent variables using a high confidence set of alignments.
The high confidence alignments are built using the interception of two unidirectional word-to-phrase alignments, which generally generates a set of alignments with high precision and low recall. Furthermore, more alignment points are added by aligning identical words in the two language and entries in a dictionary that match both sides of a sentence are also aligned.