Difference between revisions of "DeNero et al, EMNLP 2008"

From Cohen Courses
Jump to navigationJump to search
Line 16: Line 16:
  
 
== Phrase Alignment Model ==
 
== Phrase Alignment Model ==
The phrase-to-phrase alignment model presented in this work is built upon the work in [[Marcus and Wong, EMNLP 2002]]. In this work, words are clustered into phrases by a generative process, which constructs an ordered set of phrases <math>t_{1:m}</math> in the target language, an ordered set of phrases <math>s_{1:n}</math> in the source language and the alignments between phrases <math>a=\{(j,k)\}</math>, which indicates that the phrase pair with the target <math>t_j</math> and <math>s_k</math>. The process is composed by 2 steps:
+
The phrase-to-phrase alignment model used in this work is presented in [[Marcus and Wong, EMNLP 2002]]. An extension is made to allow null phrases, where a phrase does not have an equivalent in the other language. This is done by introducing another multinomial distribution <math>\theta_N</math>, from where unaligned phrase pairs are drawn. A free parameter <math>P_{N}</math> is set, defining the probability of a null phrase. This parameter is set to <math>10^{-10}</math> to discourage unaligned sentences.
  
* First, the number of components <math>l</math> is chosen and each of <math>l</math> phrase pairs are generated independently.
+
This model involves the observed sentence pairs <math>x</math>, the latent phrase segmentations and alignments <math>z</math> and the parameters <math>\theta</math>(<math>\theta_N</math> and <math>\theta_J</math>), from where phrases are drawn. This model can be used to compute the expected counts for phrase pairs for a fixed <math>\theta</math> under <math>P(z|x,\theta)</math>, which could, in turn, be used in the E-Step of EM, similarly to Word Alignments. However, computing the expected counts this way would involve computing the phrase pair counts under all possible segmentations and alignments <math>z</math>, which is too large to be computed one by one. This was the main drawback of the work in [[Marcus and Wong, EMNLP 2002]].  
* Then, a ordering for the phrases in the source phrases is chosen, and all the source and target phrases are aligned one to one.
 
  
The choice of <math>l</math> is parametrized using a geometric distribution <math>P_G</math>, with the stop parameter <math>p_{$}</math>:
+
This work tackles this problem using [[Gibbs Sampling]]. Rather iterating over all possible <math>z</math>, this work generates a sequence of samples that approximate the conditional distribution <math>P(z|x,\theta)</math>.
 
 
<math>
 
P(l) = P_G(l;p_$) = p_$ \times (1 - p_$)^{l - 1}
 
</math>
 
 
 
Phrase pairs are drawn from an unknown multinomial distribution <math>\theta_J</math>.
 
 
 
A simple position based distortion model is used, where:
 
 
 
<math>
 
P(a|[t,s]) \propto \prod_{a_i\in a} \delta(a_i)
 
</math>
 
 
 
<math>
 
P(a_i = (j,k)) = b^{|pos(t_j)-pos(s_k)\times s|}
 
</math>
 
 
 
Finally, the joint probability model for aligning sentences consisting of <math>l</math> phrase pairs is given by:
 
 
 
<math>
 
P([t,s],a) = P_G(l;p_$)P(a|[t,s])\prod_{[t,s]}\theta_J([t,s])
 
</math>
 
 
 
In the experiments paramters <math>p_$</math> and <math>b</math> were set to 0.1 and 0.85, respectively.
 
 
 
== Proposed Phrase Alignment Model ==
 
In the previous model, each source phrase had to be aligned with a target phrase in both directions. However, sentence pairs do not always contain the same information in both sides, so it should be expected that some phrase pairs are not aligned. This work extends the previous alignment model by modeling unaligned phrases.
 
This is done by introducing another multinomial distribution <math>\theta_N</math>, from where unaligned phrase pairs are drawn. A free parameter <math>P_{N}</math> is set, defining the probability of a null phrase. This parameter is set to <math>10^{-10}</math> to discourage unaligned sentences.
 
 
 
Thus, the new model is defined as:
 
<math>
 
P([t,s],a) = P_G(l;p_$)P(a|[t,s])\prod_{[t,s]}P_M([t,s])
 
</math>
 
 
 
where <math>P_M([t,s])</math> is defined as:
 
 
 
<math>
 
P_M([t,s]) = p_{N}\theta_N([t,s]) + (1 - p_{N})\theta_J([t,s])
 
</math>
 
 
 
This model involves the observed sentence pairs <math>x</math>, the latent phrase segmentations and alignments <math>z</math> and the parameters <math>\theta</math>(<math>\theta_N</math> and <math>\theta_J</math>). This model can be used to compute the expected counts for phrase pairs for a fixed <math>\theta</math> under <math>P(z|x,\theta)</math>, which could, in turn, be used in the E-Step of EM, similarly to Word Alignments. However, in the case of Phrase Alignments the space of possible <math>z</math> is too large to be computed using the same methods used in Word Alignments. Thus, this work tackles this problem by defining a Gibbs Sampler.
 
  
 
== Gibbs Sampler ==
 
== Gibbs Sampler ==

Revision as of 13:29, 27 September 2011

Citation

Denero, J., Bouchard-ct, R., & Klein, D.(2008). Sampling alignment structure under a Bayesian translation model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP '08). Association for Computational Linguistics, Stroudsburg, PA, USA, 314-323.

Online version

ACM

Summary

Unlike word-to-phrase alignments, computing the alignment and count expectations for phrase-to-phrase alignments is generally intractable due to the exponential growth of possible combination of phrases and alignments. Thus, previous attempts for building a fully joint phrase alignment model have been unsuccessful.

This paper describes the first tractable phrase-to-phrase Alignment Model, which relies on Gibbs Sampling to tackle the intractability issues.

Tests show translation improvements over Machine Translation Systems build using conventional methods.

Phrase Alignment Model

The phrase-to-phrase alignment model used in this work is presented in Marcus and Wong, EMNLP 2002. An extension is made to allow null phrases, where a phrase does not have an equivalent in the other language. This is done by introducing another multinomial distribution , from where unaligned phrase pairs are drawn. A free parameter is set, defining the probability of a null phrase. This parameter is set to to discourage unaligned sentences.

This model involves the observed sentence pairs , the latent phrase segmentations and alignments and the parameters ( and ), from where phrases are drawn. This model can be used to compute the expected counts for phrase pairs for a fixed under , which could, in turn, be used in the E-Step of EM, similarly to Word Alignments. However, computing the expected counts this way would involve computing the phrase pair counts under all possible segmentations and alignments , which is too large to be computed one by one. This was the main drawback of the work in Marcus and Wong, EMNLP 2002.

This work tackles this problem using Gibbs Sampling. Rather iterating over all possible , this work generates a sequence of samples that approximate the conditional distribution .

Gibbs Sampler

The Gibbs Sampler presented in this work starts with the initial state , which sets a initial configuration for the phrase segmentation and alignment. Then, by applying a set of local changes starting from , a sequence of samples are produced. These samples are guaranteed to approximate the conditional distribution . This means that the average counts for phrase pairs in the samples converge to the same counts under the model. Thus, these can be used to produce estimates for translation features such as the translation probability .

This work defines a set of operators that can be applied to a state to generate a new state . Each operate freezes the state , except for a local region, and chooses a reconfiguration, between all possible reconfigurations, of that region according to the conditional probability each reconfiguration, given the frozen region of the state. This frozen region of the state is called Markov blanket.

The SWAP operator freezes all phrases, and then chooses two target phrases and . All alignments are also frozen, except for the alignments and . The operator must choose to keep these alignments or switch to the alignments and