Marcus and Wong, EMNLP 2002
Citation
Marcu, D., & Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In In Proceedings of EMNLP, pp. 133–139.
Online version
Summary
This work presents a phrase-to-phrase alignment model for Statistical Machine Translation. Alignment models are generally word-to-phrase, where each target word could only be aligned with at most one source word. This work removes this restriction and n-to-m alignments between words in the source and in the target. The main contribution of this work is showing that their model outperforms the IBM model 4, in terms of translation quality in machine translation systems. The main drawback is the high cost of the training procedure that they apply.
Model
In word-to-phrase alignment models, the generative process simply assigns a lexical probability of a given word in the target to be aligned with a word in the source. In this work, words are first clustered into phrases by the generative process, which constructs an ordered set of phrases in the target language, an ordered set of phrases in the source language and the alignments between phrases , which indicates that the phrase pair with the target and . The process is composed by 2 steps:
- First, the number of components is chosen and each of phrase pairs are generated independently.
- Then, a ordering for the phrases in the source phrases is chosen, and all the source and target phrases are aligned one to one.
The choice of is parametrized using a geometric distribution , with the stop parameter :
Phrase pairs are drawn from an unknown multinomial distribution .
A simple position based distortion model is used, where:
Finally, the joint probability model for aligning sentences consisting of phrase pairs is given by:
In the experiments paramters and were set to 0.1 and 0.85, respectively.
Experiments
Tests were conducted by testing the translation quality of phrase based machine translation systems using BLEU as the evaluation score.
As for the dataset, the Hansards dataset was used, which contains around 1.1 million training sentence pairs and 500 unseen test sentences were used to test the system. A limit of 20 characters was imposed to the lengths of the sentences in the training corpora.
The model that is described in this paper is compared to the IBM model 4.
Model | BLEU |
---|---|
IBM Model 4 | 0.2158 |
Phrase-to-phrase | 0.2325 |
We can see that this model outperforms the IBM model 4 in the experiment that was performed. However, it is worth mentioning that using the alignments templates proposed in Och et al, 2004 to generate phrase-to-phrase alignments from words alignments, the IBM model 4 still works better than this model. Furthermore, this model is more computationally expensive, and generally does not work with long sentences.
Related Work
Previous work had already been done in building phrase-to-phrase alignments, but these were based on heuristics. The work in Och et al, 2004 presents a alignment template to generate phrase-to-phrase alignments from word-to-phrase alignments, called phrase pairs, which is used commonly in phrase-based machine translation systems.
The main drawback of this work is the computational complexity in running the Expectation Maximization algorithm for all the possible phrases and possible alignments between those phrases. This problem is later tackled using algorithms such as Gibbs Sampling. Another alternative is to constrict the space of possible latent variables, which is done in Birch et al, StatMT 2006.