Difference between revisions of "Vogal et al, COLING 1996"

From Cohen Courses
Jump to navigationJump to search
Line 18: Line 18:
 
IBM Model 1 defines the probability of a sentence <math>s_1^J</math>, with length <math>J</math>, being translated to a sentence <math>t_1^I</math>, with length <math>I</math>, with the alignment <math>a_1^J</math> as:
 
IBM Model 1 defines the probability of a sentence <math>s_1^J</math>, with length <math>J</math>, being translated to a sentence <math>t_1^I</math>, with length <math>I</math>, with the alignment <math>a_1^J</math> as:
  
<math>Pr(t,a|s) = \frac{\epsilon}{(J+1)^{I}}\prod_{j=1}^{J}{tr(t_j|s_{a(j)})}</math>
+
<math>
 +
Pr(t,a|s) = \frac{\epsilon}{(J+1)^{I}}\prod_{j=1}^{J}{tr(t_j|s_{a(j)})}
 +
</math>
  
 
Where the alignment <math>a_1^J</math> is a function that maps each word <math>t_j</math> to a word <math>s_i</math>, by their indexes. These alignments can be viewed as an object for indicating the corresponding words in a parallel text. We can see that the sentence translation probability <math>Pr(t,s)</math>, is decomposed into the product of the lexical translation probabilities <math>tr(t_j|s_{a(j)})</math> of each word in the target <math>t_1,...,t_{J}</math> with the word that it is aligned to in the source <math>s_{a(j)}</math>. Additionally, target words that are not aligned with any source word are aligned with the null token, with the a lexical translation probability given by <math>tr(t_j|null)</math>. These are referred as null insertions. The normalizing factor <math>\frac{\epsilon}{(I+1)^{J}}</math> ensures that <math>Pr(t,a|s)</math> is a probability and is normalized over all possible alignments <math>a</math> and all possible translations <math>t</math>.
 
Where the alignment <math>a_1^J</math> is a function that maps each word <math>t_j</math> to a word <math>s_i</math>, by their indexes. These alignments can be viewed as an object for indicating the corresponding words in a parallel text. We can see that the sentence translation probability <math>Pr(t,s)</math>, is decomposed into the product of the lexical translation probabilities <math>tr(t_j|s_{a(j)})</math> of each word in the target <math>t_1,...,t_{J}</math> with the word that it is aligned to in the source <math>s_{a(j)}</math>. Additionally, target words that are not aligned with any source word are aligned with the null token, with the a lexical translation probability given by <math>tr(t_j|null)</math>. These are referred as null insertions. The normalizing factor <math>\frac{\epsilon}{(I+1)^{J}}</math> ensures that <math>Pr(t,a|s)</math> is a probability and is normalized over all possible alignments <math>a</math> and all possible translations <math>t</math>.
Line 26: Line 28:
 
Mixture-based Alignment models~(IBM Model 2) addresses this problem by modeling the absolute distortion in the word positioning between the 2 languages, introducing an alignment probability distribution <math>Pr_a(i|j,J,I)</math>, where <math>i</math> and <math>j</math> are the word positions in the source and target sentences. Thus the equation for <math>Pr(t,a|s)</math> becomes:
 
Mixture-based Alignment models~(IBM Model 2) addresses this problem by modeling the absolute distortion in the word positioning between the 2 languages, introducing an alignment probability distribution <math>Pr_a(i|j,J,I)</math>, where <math>i</math> and <math>j</math> are the word positions in the source and target sentences. Thus the equation for <math>Pr(t,a|s)</math> becomes:
  
<math>Pr(t,a|s) = \frac{\epsilon}{(I+1)^{J}}\prod_{j=1}^{J}{tr(t_j|s_{a(j)}) Pr_a(a(j)|j,J,I)}</math>
+
<math>
 
+
Pr(t,a|s) = \frac{\epsilon}{(J+1)^{I}}\prod_{j=1}^{J}{tr(t_j|s_{a(j)}) Pr_a(a(j)|j,J,I)}
 
+
</math>
  
 
== Algorithm ==
 
== Algorithm ==
  
 
While IBM Model 2 attempts to model the absolute distortion of words in sentence pairs <math>Pr_a(i|j,J,I)</math>, alignments have a strong tendency to maintain the local neighborhood after translation.
 
While IBM Model 2 attempts to model the absolute distortion of words in sentence pairs <math>Pr_a(i|j,J,I)</math>, alignments have a strong tendency to maintain the local neighborhood after translation.

Revision as of 16:00, 23 September 2011

Citation

Vogel, S., Ney, H., & Tillmann, C. (1996). Hmm-based word alignment in statistical translation. In Proceedings of the 16th conference on Computational linguistics - Volume 2, COLING ’96, pp. 836–841, Stroudsburg, PA, USA. Association for Computational Linguistics.

Online version

ACM

Summary

Word Alignments map the word correspondence between two parallel sentences in different languages.

This work extends IBM models 1 and 2, which models lexical translation probabilities and absolute distortion probabilities, by also modeling relative distortion.

The relative distortion is modeled by applying a first-order HMM, where each alignment probabilities are dependent on the distortion of the previous alignment.

Previous work

IBM Model 1 defines the probability of a sentence , with length , being translated to a sentence , with length , with the alignment as:

Where the alignment is a function that maps each word to a word , by their indexes. These alignments can be viewed as an object for indicating the corresponding words in a parallel text. We can see that the sentence translation probability , is decomposed into the product of the lexical translation probabilities of each word in the target with the word that it is aligned to in the source . Additionally, target words that are not aligned with any source word are aligned with the null token, with the a lexical translation probability given by . These are referred as null insertions. The normalizing factor ensures that is a probability and is normalized over all possible alignments and all possible translations .

One of the problems of the IBM Model 1 is that it is very weak to reordering, since is calculated using only the lexical translation probabilities . Because of this, if the model is presented with 2 translations candidates and with the same lexical translations, but with different reordering of the translated words, the model scores both translations with the same score.

Mixture-based Alignment models~(IBM Model 2) addresses this problem by modeling the absolute distortion in the word positioning between the 2 languages, introducing an alignment probability distribution , where and are the word positions in the source and target sentences. Thus the equation for becomes:

Algorithm

While IBM Model 2 attempts to model the absolute distortion of words in sentence pairs , alignments have a strong tendency to maintain the local neighborhood after translation.