Difference between revisions of "Dyer et al, ACL 2011"
Line 1: | Line 1: | ||
Being edited by Rui Correia | Being edited by Rui Correia | ||
− | + | == Citation == | |
+ | C. Dyer, J. Clark, A. Lavie, and N. A. Smith. 2011. [http://www.cs.cmu.edu/~nasmith/papers/dyer+clark+lavie+smith.acl11.pdf Unsupervised Word Alignment with Arbitrary Features]. In Proceedings of HLT-ACL 2011, Volume 1, pp 409–419. | ||
+ | |||
+ | == Summary == | ||
+ | |||
+ | In this [[Category::paper]] the authors address the [[AddressesProblem::Word Alignments]] problem in an unsupervised fashion, filling the gap of having to manually develop a gold standard that is difficult and expensive to create and dependent on the task at hand, specially in languages with resource scarcity problems. The model that is introduced is discriminatively trained, globally normalized, being a variant of the [[RelatedPaper::IBM_Model_1|IBM Model 1]] that allows the incorporation of non-independent features. | ||
+ | |||
+ | The main focus of the paper goes to the new model propose and to the features that were considered to generate the word alignments. The authors show results for several language pairs, comparing their approach with the IBM Model 4 with respect to BLEU, METEOR and TER scores. Additionally, the authors look at how the different language pairs use the features that were designed in different ways, analyzing how these preferences are representative of each language. | ||
+ | |||
+ | == Model == | ||
+ | |||
+ | The conditional model proposed assigns probabilities to a target sentence <math>t</math> with length <math>n</math>, given a source language sentence <math>s</math>, with length <math>m</math>. Using the chain rule, the authors factor <math>p(t|s)</math> in a translation model <math>p(t|s,n)</math> and a length model <math>p(n|s)</math>, i.e., | ||
+ | |||
+ | <math> | ||
+ | p(t|s) =p(t,n|s) = p(t|s,n) \times p(n|s) | ||
+ | </math> | ||
+ | |||
+ | Regarding the translation model, the authors make the assumption that each word of the target language sentence is the translation of a single word in the source language or a special null token, introducing a latent variable <math>a = \langle a_1, a_2, ..., a_n \rangle \in [0,m]^n</math>, i.e., | ||
+ | |||
+ | <math> | ||
+ | p(t|s,n)= \sum_a p(t,a|s, n) | ||
+ | </math> | ||
+ | |||
+ | It is at this point that the model diverges from the [[IBM_Model_1|Brown et al.]] approach | ||
+ | |||
+ | == Features == | ||
+ | |||
+ | [[RelatedPaper::Marcus and Wong, EMNLP 2002]] | ||
+ | |||
+ | == Experimental Results == | ||
+ | [[UsesDataset::EUROPARL]] | ||
+ | |||
+ | [[File:Koehncoremethods.png|200px]] |
Revision as of 11:06, 29 November 2011
Being edited by Rui Correia
Citation
C. Dyer, J. Clark, A. Lavie, and N. A. Smith. 2011. Unsupervised Word Alignment with Arbitrary Features. In Proceedings of HLT-ACL 2011, Volume 1, pp 409–419.
Summary
In this paper the authors address the Word Alignments problem in an unsupervised fashion, filling the gap of having to manually develop a gold standard that is difficult and expensive to create and dependent on the task at hand, specially in languages with resource scarcity problems. The model that is introduced is discriminatively trained, globally normalized, being a variant of the IBM Model 1 that allows the incorporation of non-independent features.
The main focus of the paper goes to the new model propose and to the features that were considered to generate the word alignments. The authors show results for several language pairs, comparing their approach with the IBM Model 4 with respect to BLEU, METEOR and TER scores. Additionally, the authors look at how the different language pairs use the features that were designed in different ways, analyzing how these preferences are representative of each language.
Model
The conditional model proposed assigns probabilities to a target sentence with length , given a source language sentence , with length . Using the chain rule, the authors factor in a translation model and a length model , i.e.,
Regarding the translation model, the authors make the assumption that each word of the target language sentence is the translation of a single word in the source language or a special null token, introducing a latent variable , i.e.,
It is at this point that the model diverges from the Brown et al. approach
Features
Marcus and Wong, EMNLP 2002