Difference between revisions of "Birch et al, StatMT 2006"

Latest revision as of 19:13, 26 November 2011

Citation

Alexandra Birch, Chris Callison-Burch, and Miles Osborne. 2006. Constraining the phrase-based, joint probability statistical translation model. In The Conference for the Association for Machine Translation in the Americas.

Online version

pdf

Summary

The model proposed in Marcus and Wong, EMNLP 2002 provides a strong framework for phrase-to-phrase alignments, but its applicability is hamstrung by the computational complexity of the running EM in the large space of latent variables generated from all possible phrases and alignments.

This work describes a phrase-to-phrase alignment model, uses word-to-phrase alignments to constrain the space of phrasal alignments, improving the scalability of the model and also the performance in the Machine Translation task.

Description of the method

The joint model proposed in Marcus and Wong, EMNLP 2002 searches the space of all possible latent variables (phrases and alignments between phrases) during the EM algorithm, which is computationally expensive. The goal of this method is to define hard constraints on the possible latent variables using a high confidence set of alignments.

The high confidence alignments are built using the interception of two unidirectional word-to-phrase alignments, which generally generates a set of alignments with high precision and low recall. Furthermore, more alignment points are added by aligning identical words in the two language and entries in a dictionary that match both sides of a sentence are also aligned.

Using this high confidence alignment, the space of possible phrase pairs is limited to phrase pairs that contain at least one alignment between the source words and target words, similarly to the alignment template defined in Och et al, 2004. To further constrict the phrase space, phrase pairs must occur at least a given number of times in the training corpora to be considered. These constraints are applied during the initialization of the parameters of the model and during the E-step of the EM algorithm.

Another improvement in the model is the introduction of the lexical weighting information which are calculated for phrase pairs using the high confidence alignments between words in the phrase pair.

Experiments

Tests were conducted by evaluating the translation quality using BLEU. The EUROPARL German-English data was used, which contains around 1.6 millions training sentences. However, due to the limitations of the model, only up to 40000 (10000, 20000, and 40000) training sentences were used. The test set was composed by 1755 with length between 5 and 15 characters.

Model	10000 sentences	20000 sentences	40000 sentences
Joint model (Marcus and Wong, EMNLP 2002)	21.69	23.61	25.52
+ lex	22.79	24.33	25.99
+ lex + ident	23.30	24.90	26.12
+ lex + ident + dict	23.20	24.96	26.13

We can see that this work improves the results obtained using the model proposed by Marcus and Wong, EMNLP 2002. The first improvement is obtained by using lexical weighting in addition to the phrase translation probabilities (+ lex). Then, by improving the alignments given by adding alignment points for words that are identical (+ lex + ident) we can further improve the results. Finally, by adding a dictionary, we can further improve the results from the translation.

Related Work

This work extends the work in Marcus and Wong, EMNLP 2002 by constricting the latent variable space using a high confidence word-to-word alignment.

The algorithm used to constrict the space of possible phrase pairs is based on the work in Och et al, 2004, which is commonly used in Phrase Extraction for Machine Translation.

@@ Line 1: / Line 1: @@
 == Citation ==
-Alexandra Birch, Chris Callison-Burch, and Miles Os- borne. 2006. Constraining the phrase-based, joint probability statistical translation model. In The Conference for the Association for Machine Translation in the Americas.
+Alexandra Birch, Chris Callison-Burch, and Miles Osborne. 2006. Constraining the phrase-based, joint probability statistical translation model. In The Conference for the Association for Machine Translation in the Americas.
-== A ==
+== Online version  ==
+[http://dl.acm.org/ft_gateway.cfm?id=1654675&type=pdf&CFID=70405940&CFTOKEN=69587577 pdf]
+== Summary ==
+The model proposed in [[Marcus and Wong, EMNLP 2002]] provides a strong framework for phrase-to-phrase alignments, but its applicability is hamstrung by the computational complexity of the running [[UsesMethod:: Expectation Maximization | EM]] in the large space of latent variables generated from all possible phrases and alignments.
+This [[Category::paper | work]] describes a phrase-to-phrase alignment model, uses word-to-phrase alignments to constrain the space of phrasal alignments, improving the scalability of the model and also the performance in the [[AddressesProblem::Machine Translation]] task.
+== Description of the method ==
+The joint model proposed in [[Marcus and Wong, EMNLP 2002]] searches the space of all possible latent variables (phrases and alignments between phrases) during the EM algorithm, which is computationally expensive. The goal of this method is to define hard constraints on the possible latent variables using a high confidence set of alignments.
+The high confidence alignments are built using the interception of two unidirectional word-to-phrase alignments, which generally generates a set of alignments with high precision and low recall. Furthermore, more alignment points are added by aligning identical words in the two language and entries in a dictionary that match both sides of a sentence are also aligned.
+Using this high confidence alignment, the space of possible phrase pairs is limited to phrase pairs that contain at least one alignment between the source words and target words, similarly to the alignment template defined in [http://dl.acm.org/ft_gateway.cfm?id=1105589&type=pdf&CFID=70405940&CFTOKEN=69587577 Och et al, 2004]. To further constrict the phrase space, phrase pairs must occur at least a given number of times in the training corpora to be considered.
+These constraints are applied during the initialization of the parameters of the model and during the E-step of the [[UsesMethod:: Expectation Maximization | EM]] algorithm.
+Another improvement in the model is the introduction of the lexical weighting information which are calculated for phrase pairs using the high confidence alignments between words in the phrase pair.
+== Experiments ==
+Tests were conducted by evaluating the translation quality using BLEU. The [[UsesDataset::EUROPARL]] German-English data was used, which contains around 1.6 millions training sentences. However, due to the limitations of the model, only up to 40000 (10000, 20000, and 40000) training sentences were used. The test set was composed by 1755  with length between 5 and 15 characters.
+{| class="wikitable" border="1"
+|-
+! Model
+! 10000 sentences
+! 20000 sentences
+! 40000 sentences
+|-
+| Joint model ([[Marcus and Wong, EMNLP 2002]])
+| 21.69
+| 23.61
+| 25.52
+|-
+| + lex
+| 22.79
+| 24.33
+| 25.99
+|-
+| + lex + ident
+| 23.30
+| 24.90
+| 26.12
+|-
+| + lex + ident + dict
+| 23.20
+| 24.96
+| 26.13
+|}
+We can see that this work improves the results obtained using the model proposed by [[Marcus and Wong, EMNLP 2002]]. The first improvement is obtained by using lexical weighting in addition to the phrase translation probabilities (+ lex). Then, by improving the alignments given by adding alignment points for words that are identical (+ lex + ident) we can further improve the results. Finally, by adding a dictionary, we can further improve the results from the translation.
+== Related Work ==
+This work extends the work in [[Marcus and Wong, EMNLP 2002]] by constricting the latent variable space using a high confidence word-to-word alignment.
+The algorithm used to constrict the space of possible phrase pairs is based on the work in [http://dl.acm.org/ft_gateway.cfm?id=1105589&type=pdf&CFID=70405940&CFTOKEN=69587577 Och et al, 2004], which is commonly used in [[Phrase Extraction]] for [[Machine Translation]].

Difference between revisions of "Birch et al, StatMT 2006"

Latest revision as of 19:13, 26 November 2011

Contents

Citation

Online version

Summary

Description of the method

Experiments

Related Work

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools