Difference between revisions of "Vogal et al, COLING 1996"

Revision as of 10:39, 27 September 2011

Citation

Vogel, S., Ney, H., & Tillmann, C. (1996). Hmm-based word alignment in statistical translation. In Proceedings of the 16th conference on Computational linguistics - Volume 2, COLING ’96, pp. 836–841, Stroudsburg, PA, USA. Association for Computational Linguistics.

Online version

ACM

Summary

This is a highly influential work on Word Alignments. This work IBM models 1 and 2, which models lexical translation probabilities and absolute distortion probabilities, by also modeling relative distortion.

The relative distortion is modeled by applying a first-order Hidden Markov Model, where each alignment probabilities are dependent on the distortion of the previous alignment.

Results indicate that Modeling the relative distortion can improve the overall quality of the Word Alignments.

Model

IBM Model 2 attempts to model the absolute distortion of words in sentence pairs. However, alignments have a strong tendency to maintain the local neighborhood after translation.

This model uses a first order Hidden Markov Model to restructure the alignment model $Pr(t,a|s)$ used in IBM Model 2 to include first order alignment dependencies. It defines the probability of a sentence $s_{1}^{J}$ , with length $J$ , being translated to a sentence $t_{1}^{I}$ , with length $I$ , with the alignment $a_{1}^{J}$ as:

$Pr(t,a|s)={\frac {\epsilon }{(I+1)^{J}}}\prod _{j=1}^{J}{tr(t_{j}|s_{a(j)})Pr_{a}(a(j)|a(j-1),I)}$

Where the alignment probability $Pr_{a}(a(j)|a(j-1),I)$ is calculated as:

$Pr_{a}(i|i',I)={\frac {c(i-i')}{\sum _{k=1}^{I}c(k-i')}}$

In this formulation, the distortion probability does not depend on the word positions but in the jump width (i-i').

Viterbi Alignment

The alignment probability $Pr(a|t,s)$ for a given sentence pair is given by:

$Pr(a|t,s)={\frac {Pr(t,a|s)}{Pr(t|s)}}$

The Viterbi alignment is the alignment with the highest $Pr(a|t,s)$ . While in previous alignment models, the Viterbi alignment could be determined in polynomial time, by maximizing the alignment probability for each target word, due to the independence assumptions that are made, finding the optimum alignment for the HMM-based model is more complex, due to the first order dependencies between alignments. This can still be calculated in polynomial time, with complexity $O(I^{2}J)$ , using the dynamic programming algorithm, similar to Viterbi Decoding proposed this work. This algorithm defines the partial alignment probability $Q(i,j)$ , which is defined as

$Q(i,j)=tr(t|s)max_{i'=1}^{I}Pr_{a}(i|i',I)Q(i',j-1)$

$Q(i,j)$ can be seen as the Viterbi alignment from the partial target sentence from $t_{1}$ to $t_{j}$ , that contains the word alignment from $t_{j}$ to $t_{i}$ . This can be done because each word alignment is only dependent on the previous alignment.

Corpora

Tests were performed using the following corpora:

Corpora	Language Pair	Words	Vocabulary	Description
Avalanche Bulletins	French-German	French:62849 German:44805	French:1993 German:2265	Avalanche Bulletins published by the Swiss Federal Institute for Snow and Avalanche Research
Verbmobil Corpus	Spanish-English	Spanish:13768 English:15888	Spanish2008, English:1830	Spontaneous spoken dialog in the domain of appointment scheduling
EuTrans Corpus	German-English	German:150279, English:154727	German:4017, English:2443	Typical phrases in the tourism and travel domain

Training

This work compares the HMM-based alignment model with IBM model 2. The training setup for both models start with 10 EM iterations using IBM model 1, to obtain the initial distribution for the lexical translation probabilities $tr(t_{j}|s_{a(j)})$ . This was used to initialize both the IBM model 2 and the HMM-based model. Next, 5 EM iterations were run for the IBM Model 2 and the HMM-based Model.

Experimental Results

The quality of the alignments produced by each model is measured in terms of the translation, alignment and total perplexity:

Avalanche Bulletins	Translation	Alignment	Total
IBM Model 2	3.18	10.05	32.00
HMM Model	3.45	5.84	20.18

EuTrans Corpus	Translation	Alignment	Total
IBM Model 2	2.44	4.00	9.78
HMM Model	2.46	3.93	9.69

Verbmobil Corpus	Translation	Alignment	Total
IBM Model 2	4.70	6.54	30.71
HMM Model	4.86	5.42	26.50

From these results, it is concluded that IBM Model 2 gives slightly better results for the perplexity of translation probabilities, while the HMM-based Model gives better perplexity values for alignment probabilities. This is explained by the fact that in some cases the relative distortion used in HMM-based model gives more accurate results than the absolute distortion used in IBM Model 2 and vice-versa.

@@ Line 81: / Line 81: @@
 This work compares the HMM-based alignment model with IBM model 2. The training setup for both models start with 10 EM iterations using IBM model 1, to obtain the initial distribution for the lexical translation probabilities <math>tr(t_j|s_{a(j)})</math>. This was used to initialize both the IBM model 2 and the HMM-based model. Next, 5 EM iterations were run for the IBM Model 2 and the HMM-based Model.
-== Results ==
+== Experimental Results ==
 The quality of the alignments produced by each model is measured in terms of the translation, alignment and total perplexity:
 {| class="wikitable" border="1"

Difference between revisions of "Vogal et al, COLING 1996"

Revision as of 10:39, 27 September 2011

Contents

Citation

Online version

Summary

Model

Viterbi Alignment

Corpora

Training

Experimental Results

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools