IBM Model 1

From Cohen Courses
Revision as of 09:18, 27 September 2011 by Lingwang (talk | contribs) (Created page with '== Model == IBM Model 1 defines the probability of a sentence <math>s_1^J</math>, with length <math>J</math>, being translated to a sentence <math>t_1^I</math>, with length <mat…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Model

IBM Model 1 defines the probability of a sentence , with length , being translated to a sentence , with length , with the alignment as:

Where the alignment is a function that maps each word to a word , by their indexes. These alignments can be viewed as an object for indicating the corresponding words in a parallel text. We can see that the sentence translation probability , is decomposed into the product of the lexical translation probabilities of each word in the target with the word that it is aligned to in the source . Additionally, target words that are not aligned with any source word are aligned with the null token, with the a lexical translation probability given by . These are referred as null insertions. The normalizing factor ensures that is a probability and is normalized over all possible alignments and all possible translations .

One of the problems of the IBM Model 1 is that it is very weak to reordering, since is calculated using only the lexical translation probabilities . Because of this, if the model is presented with 2 translations candidates and with the same lexical translations, but with different reordering of the translated words, the model scores both translations with the same score.