Difference between revisions of "IBM Model 1"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Model == IBM Model 1 defines the probability of a sentence <math>s_1^J</math>, with length <math>J</math>, being translated to a sentence <math>t_1^I</math>, with length <mat…')
 
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
== Citation ==
 +
 +
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19, 263–311.
 +
 +
== Online version ==
 +
 +
[http://dl.acm.org/ft_gateway.cfm?id=972474&type=pdf&CFID=49761657&CFTOKEN=94001682 pdf]
 +
 +
== Summary==
 +
IBM Model 1 is a word alignment model, which uses an Expectation Maximization Algorithm to compute the lexical translation probabilities in parallel texts.
 +
 
== Model ==
 
== Model ==
  
Line 8: Line 19:
  
 
Where the alignment <math>a_1^J</math> is a function that maps each word <math>t_j</math> to a word <math>s_i</math>, by their indexes. These alignments can be viewed as an object for indicating the corresponding words in a parallel text. We can see that the sentence translation probability <math>Pr(t,s)</math>, is decomposed into the product of the lexical translation probabilities <math>tr(t_j|s_{a(j)})</math> of each word in the target <math>t_1,...,t_{J}</math> with the word that it is aligned to in the source <math>s_{a(j)}</math>. Additionally, target words that are not aligned with any source word are aligned with the null token, with the a lexical translation probability given by <math>tr(t_j|null)</math>. These are referred as null insertions. The normalizing factor <math>\frac{\epsilon}{(I+1)^{J}}</math> ensures that <math>Pr(t,a|s)</math> is a probability and is normalized over all possible alignments <math>a</math> and all possible translations <math>t</math>.
 
Where the alignment <math>a_1^J</math> is a function that maps each word <math>t_j</math> to a word <math>s_i</math>, by their indexes. These alignments can be viewed as an object for indicating the corresponding words in a parallel text. We can see that the sentence translation probability <math>Pr(t,s)</math>, is decomposed into the product of the lexical translation probabilities <math>tr(t_j|s_{a(j)})</math> of each word in the target <math>t_1,...,t_{J}</math> with the word that it is aligned to in the source <math>s_{a(j)}</math>. Additionally, target words that are not aligned with any source word are aligned with the null token, with the a lexical translation probability given by <math>tr(t_j|null)</math>. These are referred as null insertions. The normalizing factor <math>\frac{\epsilon}{(I+1)^{J}}</math> ensures that <math>Pr(t,a|s)</math> is a probability and is normalized over all possible alignments <math>a</math> and all possible translations <math>t</math>.
 +
 +
== Related Work ==
  
 
One of the problems of the IBM Model 1 is that it is very weak to reordering, since <math>p(f,a|s)</math> is calculated using only the lexical translation probabilities <math>tr(t|s)</math>. Because of this, if the model is presented with 2 translations candidates <math>t_1</math> and <math>t_2</math> with the same lexical translations, but with different reordering of the translated words, the model scores both translations with the same score.
 
One of the problems of the IBM Model 1 is that it is very weak to reordering, since <math>p(f,a|s)</math> is calculated using only the lexical translation probabilities <math>tr(t|s)</math>. Because of this, if the model is presented with 2 translations candidates <math>t_1</math> and <math>t_2</math> with the same lexical translations, but with different reordering of the translated words, the model scores both translations with the same score.
 +
 +
This problem is addressed in subsequent IBM Models, such as [[IBM Model 2]].

Latest revision as of 23:29, 29 September 2011

Citation

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Comput. Linguist., 19, 263–311.

Online version

pdf

Summary

IBM Model 1 is a word alignment model, which uses an Expectation Maximization Algorithm to compute the lexical translation probabilities in parallel texts.

Model

IBM Model 1 defines the probability of a sentence , with length , being translated to a sentence , with length , with the alignment as:

Where the alignment is a function that maps each word to a word , by their indexes. These alignments can be viewed as an object for indicating the corresponding words in a parallel text. We can see that the sentence translation probability , is decomposed into the product of the lexical translation probabilities of each word in the target with the word that it is aligned to in the source . Additionally, target words that are not aligned with any source word are aligned with the null token, with the a lexical translation probability given by . These are referred as null insertions. The normalizing factor ensures that is a probability and is normalized over all possible alignments and all possible translations .

Related Work

One of the problems of the IBM Model 1 is that it is very weak to reordering, since is calculated using only the lexical translation probabilities . Because of this, if the model is presented with 2 translations candidates and with the same lexical translations, but with different reordering of the translated words, the model scores both translations with the same score.

This problem is addressed in subsequent IBM Models, such as IBM Model 2.