Marcus and Wong, EMNLP 2002

Citation

Marcu, D., & Wong, W. (2002). A phrase-based, joint probability model for statistical machine translation. In In Proceedings of EMNLP, pp. 133–139.

Online version

pdf

Summary

This work presents a phrase-to-phrase alignment model for Statistical Machine Translation. Alignment models are generally word-to-phrase, where each target word could only be aligned with at most one source word. This work removes this restriction and n-to-m alignments between words in the source and in the target. The main contribution of this work is showing that their model outperforms the IBM model 4, in terms of translation quality in machine translation systems. The main drawback is the high cost of the training procedure that they apply.

Model

In word-to-phrase alignment models, the generative process simply assigns a lexical probability of a given word in the target to be aligned with a word in the source. In this work, words are first clustered into phrases by the generative process, which constructs an ordered set of phrases $t_{1:m}$ in the target language, an ordered set of phrases $s_{1:n}$ in the source language and the alignments between phrases $a=\{(j,k)\}$ , which indicates that the phrase pair with the target $t_{j}$ and $s_{k}$ . The process is composed by 2 steps:

First, the number of components $l$ is chosen and each of $l$ phrase pairs are generated independently.
Then, a ordering for the phrases in the source phrases is chosen, and all the source and target phrases are aligned one to one.

The choice of $l$ is parametrized using a geometric distribution $P_{G}$ , with the stop parameter $p_{\$}$ :

$P(l)=P_{G}(l;p_{\$})=p_{\$}\times (1-p_{\$})^{l-1}$

Phrase pairs are drawn from an unknown multinomial distribution $\theta _{J}$ .

A simple position based distortion model is used, where:

$P(a|[t,s])\propto \prod _{a_{i}\in a}\delta (a_{i})$

$P(a_{i}=(j,k))=b^{|pos(t_{j})-pos(s_{k})\times s|}$

Finally, the joint probability model for aligning sentences consisting of $l$ phrase pairs is given by:

$P([t,s],a)=P_{G}(l;p_{\$})P(a|[t,s])\prod _{[t,s]}\theta _{J}([t,s])$

In the experiments paramters $p_{\$}$ and $b$ were set to 0.1 and 0.85, respectively.

Experiments

Tests were conducted by testing the translation quality of phrase based machine translation systems using BLEU as the evaluation score.

As for the dataset, the Hansards dataset was used, which contains around 1.1 million training sentence pairs and 500 unseen test sentences were used to test the system. A limit of 20 characters was imposed to the lengths of the sentences in the training corpora.

The model that is described in this paper is compared to the IBM model 4.

Model	BLEU
IBM Model 4	0.2158
Phrase-to-phrase	0.2325

We can see that this model outperforms the IBM model 4 in the experiment that was performed.

Related Work

Previous work had already been done in building phrase-to-phrase alignments, but these were based on heuristics. The work in Och et al, 2004 presents a alignment template to generate phrase-to-phrase alignments from word-to-phrase alignments, called phrase pairs, which is used commonly in phrase-based machine translation systems.

The main drawback of this work is the computational complexity in running the EM algorithm for all the possible phrases and possible alignments between those phrases. This problem is later tackled using algorithms such as Gibbs Sampling.

Marcus and Wong, EMNLP 2002

Contents

Citation

Online version

Summary

Model

Experiments

Related Work

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools