Difference between revisions of "Dyer et al, ACL 2011"

Revision as of 11:06, 29 November 2011

Being edited by Rui Correia

Citation

C. Dyer, J. Clark, A. Lavie, and N. A. Smith. 2011. Unsupervised Word Alignment with Arbitrary Features. In Proceedings of HLT-ACL 2011, Volume 1, pp 409–419.

Summary

In this paper the authors address the Word Alignments problem in an unsupervised fashion, filling the gap of having to manually develop a gold standard that is difficult and expensive to create and dependent on the task at hand, specially in languages with resource scarcity problems. The model that is introduced is discriminatively trained, globally normalized, being a variant of the IBM Model 1 that allows the incorporation of non-independent features.

The main focus of the paper goes to the new model propose and to the features that were considered to generate the word alignments. The authors show results for several language pairs, comparing their approach with the IBM Model 4 with respect to BLEU, METEOR and TER scores. Additionally, the authors look at how the different language pairs use the features that were designed in different ways, analyzing how these preferences are representative of each language.

Model

The conditional model proposed assigns probabilities to a target sentence $t$ with length $n$ , given a source language sentence $s$ , with length $m$ . Using the chain rule, the authors factor $p(t|s)$ in a translation model $p(t|s,n)$ and a length model $p(n|s)$ , i.e.,

$p(t|s)=p(t,n|s)=p(t|s,n)\times p(n|s)$

Regarding the translation model, the authors make the assumption that each word of the target language sentence is the translation of a single word in the source language or a special null token, introducing a latent variable $a=\langle a_{1},a_{2},...,a_{n}\rangle \in [0,m]^{n}$ , i.e.,

$p(t|s,n)=\sum _{a}p(t,a|s,n)$

It is at this point that the model diverges from the Brown et al. approach

Features

Marcus and Wong, EMNLP 2002

Experimental Results

EUROPARL

@@ Line 1: / Line 1: @@
 Being edited by Rui Correia
-* [http://www.cs.cmu.edu/~nasmith/papers/dyer+clark+lavie+smith.acl11.pdf Unsupervised Word Alignment with Arbitrary Features], C. Dyer, J. Clark, A. Lavie, and N. A. Smith, ACL 2011
+== Citation ==
+C. Dyer, J. Clark, A. Lavie, and N. A. Smith. 2011. [http://www.cs.cmu.edu/~nasmith/papers/dyer+clark+lavie+smith.acl11.pdf Unsupervised Word Alignment with Arbitrary Features]. In Proceedings of HLT-ACL 2011, Volume 1, pp 409–419.
+== Summary ==
+In this [[Category::paper]] the authors address the [[AddressesProblem::Word Alignments]] problem in an unsupervised fashion, filling the gap of having to manually develop a gold standard that is difficult and expensive to create and dependent on the task at hand, specially in languages with resource scarcity problems. The model that is introduced is discriminatively trained, globally normalized, being a variant of the [[RelatedPaper::IBM_Model_1|IBM Model 1]] that allows the incorporation of non-independent features.
+The main focus of the paper goes to the new model propose and to the features that were considered to generate the word alignments. The authors show results for several language pairs, comparing their approach with the IBM Model 4 with respect to BLEU, METEOR and TER scores. Additionally, the authors look at how the different language pairs use the features that were designed in different ways, analyzing how these preferences are representative of each language.
+== Model ==
+The conditional model proposed assigns probabilities to a target sentence <math>t</math> with length <math>n</math>, given a source language sentence <math>s</math>, with length <math>m</math>. Using the chain rule, the authors factor <math>p(t|s)</math> in a translation model <math>p(t|s,n)</math> and a length model <math>p(n|s)</math>, i.e.,
+<math>
+p(t|s) =p(t,n|s) = p(t|s,n) \times p(n|s)
+</math>
+Regarding the translation model, the authors make the assumption that each word of the target language sentence is the translation of a single word in the source language or a special null token, introducing a latent variable <math>a = \langle a_1, a_2, ..., a_n \rangle \in [0,m]^n</math>, i.e.,
+<math>
+p(t|s,n)= \sum_a p(t,a|s, n)
+</math>
+It is at this point that the model diverges from the [[IBM_Model_1|Brown et al.]] approach
+== Features ==
+ [[RelatedPaper::Marcus and Wong, EMNLP 2002]]
+== Experimental Results ==
+[[UsesDataset::EUROPARL]]
+[[File:Koehncoremethods.png|200px]]

Difference between revisions of "Dyer et al, ACL 2011"

Revision as of 11:06, 29 November 2011

Contents

Citation

Summary

Model

Features

Experimental Results

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools