Difference between revisions of "Word Alignments using an HMM-based model"

From Cohen Courses
Jump to navigationJump to search
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
== Project Name ==
 +
 +
Phrasal Alignments using Posterior Regularization
 +
 
== Summary ==
 
== Summary ==
  
Word alignments are an important notion introduced in Word-based Machine Translation, and are commonly employed in Phrase-based machine translation. In parallel corpora, sentences in different languages are not aligned word by word but sentence by sentence. Thus, it is not trivial to fragment the sentence pair into smaller translation units. Word alignments map each word in the source sentence to a equivalent word in the target sentence.
+
Word alignments are an important notion introduced in Word-based Machine Translation, and are commonly employed in Phrase-based machine translation. In parallel corpora, sentences in different languages are not aligned word by word but sentence by sentence. Thus, it is not trivial to fragment the sentence pair into smaller translation units.
 +
 
 +
Several phrase-to-phrase alignment models have been previously proposed. Some recurring problems in these models is the intractability of their latent variables during the E-step of [[Expectation Maximization | EM]] and the model degeneration where the model will be biased towards longer phrases, rather than combining shorter phrases to form longer phrases, since using longer phrases incur less distortion and generation penalties. The first problem has been addressed using [[Gibbs sampling | Gibbs Sampling]] in [[DeNero et al, EMNLP 2008]], and the model degeneration problem has been dealt with by defining a [[Dirichlet distribution]] over the phrase pair distribution.
 +
 
 +
[[Posterior Regularization for Expectation Maximization | Posterior Regularization]] has been used to improve word-level Alignment models in [[Graça et al, Computational Linguistics 2010]]. This is done by defining constraints for bijectivity and symmetry.
 +
In our project, we will attempt to use [[Posterior Regularization for Expectation Maximization | Posterior Regularization]] to address the model degeneration problem. Using PR, we will define constraints so that longer phrases are only selected when their expectations are high enough.
 +
 
 +
The quality of the alignments can be tested using a gold standard corpora, where the Word Alignments are produced by human linguists. One example of such a corpora is the [http://www.isi.edu/natural-language/download/hansard/ Hansards] corpora. Another evaluation method is to use the produced alignments in a Machine Translation system and assess that an improvement is achieved in terms of translation quality when using the improved alignments.
 +
 
 +
Proposed by: [[User:Lingwang|Wang Ling]]
 +
 
 +
Comments:  so you're proposing a more structured kind of alignment model that aligns phrases to phrases.  I'm not sure why "HMM-based" is in your title; you've moved pretty far from the Vogel et al.-style models!  Further, you haven't really clearly stated what your idea is, in relation to prior work.  What will your approach do that was not done in prior papers?  It sounds like you are trying to build a generative model that has latent variables, much like others have done, and train it with posterior-regularized EM.  It sounds like you believe PR can be used in place of Gibbs sampling and priors; I don't understand how it's going to get you around the intractability of inference (the reason for using Gibbs sampling).  You also need to be clear about previous work using PR for this problem.  --[[User:Nasmith|Nasmith]] 21:10, 9 October 2011 (UTC)
 +
 
 +
== Baseline ==
 +
We will use a traditional pipeline for phrase based machine translation. We will build the Word Alignments and the Translation Models using the [http://code.google.com/p/geppetto/ Geppetto] toolkit, then we will tune the parameters our model using MERT (Minimum Error Rate Training) and decode using [http://www.statmt.org/moses/ Moses].
  
The goal of this project is to implement a Word Alignment Model where the relative word distortion is modeled using a [[Hidden Markov Model]].
+
The baseline alignment model will be the [[Hidden Markov Model]] defined in.
 
This task is similar to [http://dl.acm.org/citation.cfm?id=993313]. This model will be used as the baseline.  
 
This task is similar to [http://dl.acm.org/citation.cfm?id=993313]. This model will be used as the baseline.  
  
We will extend the HMM-based word-to-phrase alignment model to a phrase-to-phrase alignments in a way similar to the model in [[Bansal_et_al,_ACL_2011]]. One problem with phrase-to-phrase alignment models is their intractability due to the large size of latent variables that must be measured during the E-step in [[Expectation Maximization | EM]]. Another problem is the model degeneration, since the model will be biased towards longer phrases, rather than combining shorter phrases to form longer phrases, since using longer phrases incur less distortion and generation penalties. Previously, the first problem has been addressed before by using [[Gibbs Sampling]], and the second problem has been dealt with by defining a [[Dirichlet distribution]] over the phrase pair distribution.
+
The code for the HMM phrasal alignments will be implemented directly in the Geppetto toolkit and uploaded into the repository after the completion of this work.  
 +
 
 +
Depending on the size of the data sets used multiple runs of the MERT tuning will be required to stabilize the result (averaging the scores).
 +
 
 +
== Phrasal Alignment Model ==
 +
 
 +
 
 +
== Evaluation ==
 +
 
 +
The translation system will be tested using BLEU and METEOR, which are translation scores.  
 +
 
 +
BLEU is essentially a ratio of n-grams in the translation that are in one of the possible references of that translation.
  
We will attempt to use [[Posterior Regularization]] to address these two problems. First, we will define constraints to limit the search space of possible latent variables during the E-step by excluding unlikely alignments and segmentations. Then, we will also try to avoid the degenerative behavior of phrase-to-phrase models by defining constraints so that longer phrases are only selected when their expectations are high enough.
+
METEOR tries to align the words in the translation and the words in the references by finding exact matches, stem matches, synonyms and paraphrases among others.  
  
The quality of the alignments can be tested using a gold standard corpora, where the Word Alignments are produced by human linguists. One example of such a corpora is the Hansards corpora [http://www.isi.edu/natural-language/download/hansard/]. Another evaluation method is to use the produced alignments in a Machine Translation system and assess that an improvement is achieved in terms of translation quality when using the improved alignments.
+
The alignment quality can be tested using AER (Alignment Error Rate), which computes the <math>1 - F_0</math>, where <math>F_0</math> is the harmonic mean between the precision and recall of the alignments. The reference must be a gold standard, which are Word Alignments that are annotated by humans.
  
== Baseline ==
+
== Corpora ==
We will use a traditional pipeline for phrase based machine translation. We will build the Word Alignments and the Translation Models using the [http://code.google.com/p/geppetto/ Geppetto] toolkit, then we will tune the parameters our model using MERT (minimum error rate training) and decode using [http://www.statmt.org/moses/ Moses].
+
 
 +
A small scale translation test will be conducted using the IWSLT 2010 Chinese-English DIALOG training set, consisting of 30K parallel sentences, to build the alignments and train the translation model. We will use the development and test sets from IWSLT 2006 and IWSLT 2007, with 500 parallel sentences each, to run MERT and to evaluate the results.
 +
 
 +
We will also test the quality of the alignments themselves using the [http://www.isi.edu/natural-language/download/hansard/ Hansards] corpora, using 1M sentences for training and 500 sentences for testing. This corpora can also be used for testing the translation quality in a larger scale.
  
Proposed by: [[User:Lingwang|Wang Ling]]
+
Comment: it's not clear to me what human-gold-standard alignments you are going to use for the intrinsic evaluation.  --[[User:Nasmith|Nasmith]] 21:15, 9 October 2011 (UTC)

Latest revision as of 16:52, 13 October 2011

Project Name

Phrasal Alignments using Posterior Regularization

Summary

Word alignments are an important notion introduced in Word-based Machine Translation, and are commonly employed in Phrase-based machine translation. In parallel corpora, sentences in different languages are not aligned word by word but sentence by sentence. Thus, it is not trivial to fragment the sentence pair into smaller translation units.

Several phrase-to-phrase alignment models have been previously proposed. Some recurring problems in these models is the intractability of their latent variables during the E-step of EM and the model degeneration where the model will be biased towards longer phrases, rather than combining shorter phrases to form longer phrases, since using longer phrases incur less distortion and generation penalties. The first problem has been addressed using Gibbs Sampling in DeNero et al, EMNLP 2008, and the model degeneration problem has been dealt with by defining a Dirichlet distribution over the phrase pair distribution.

Posterior Regularization has been used to improve word-level Alignment models in Graça et al, Computational Linguistics 2010. This is done by defining constraints for bijectivity and symmetry. In our project, we will attempt to use Posterior Regularization to address the model degeneration problem. Using PR, we will define constraints so that longer phrases are only selected when their expectations are high enough.

The quality of the alignments can be tested using a gold standard corpora, where the Word Alignments are produced by human linguists. One example of such a corpora is the Hansards corpora. Another evaluation method is to use the produced alignments in a Machine Translation system and assess that an improvement is achieved in terms of translation quality when using the improved alignments.

Proposed by: Wang Ling

Comments: so you're proposing a more structured kind of alignment model that aligns phrases to phrases. I'm not sure why "HMM-based" is in your title; you've moved pretty far from the Vogel et al.-style models! Further, you haven't really clearly stated what your idea is, in relation to prior work. What will your approach do that was not done in prior papers? It sounds like you are trying to build a generative model that has latent variables, much like others have done, and train it with posterior-regularized EM. It sounds like you believe PR can be used in place of Gibbs sampling and priors; I don't understand how it's going to get you around the intractability of inference (the reason for using Gibbs sampling). You also need to be clear about previous work using PR for this problem. --Nasmith 21:10, 9 October 2011 (UTC)

Baseline

We will use a traditional pipeline for phrase based machine translation. We will build the Word Alignments and the Translation Models using the Geppetto toolkit, then we will tune the parameters our model using MERT (Minimum Error Rate Training) and decode using Moses.

The baseline alignment model will be the Hidden Markov Model defined in. This task is similar to [1]. This model will be used as the baseline.

The code for the HMM phrasal alignments will be implemented directly in the Geppetto toolkit and uploaded into the repository after the completion of this work.

Depending on the size of the data sets used multiple runs of the MERT tuning will be required to stabilize the result (averaging the scores).

Phrasal Alignment Model

Evaluation

The translation system will be tested using BLEU and METEOR, which are translation scores.

BLEU is essentially a ratio of n-grams in the translation that are in one of the possible references of that translation.

METEOR tries to align the words in the translation and the words in the references by finding exact matches, stem matches, synonyms and paraphrases among others.

The alignment quality can be tested using AER (Alignment Error Rate), which computes the , where is the harmonic mean between the precision and recall of the alignments. The reference must be a gold standard, which are Word Alignments that are annotated by humans.

Corpora

A small scale translation test will be conducted using the IWSLT 2010 Chinese-English DIALOG training set, consisting of 30K parallel sentences, to build the alignments and train the translation model. We will use the development and test sets from IWSLT 2006 and IWSLT 2007, with 500 parallel sentences each, to run MERT and to evaluate the results.

We will also test the quality of the alignments themselves using the Hansards corpora, using 1M sentences for training and 500 sentences for testing. This corpora can also be used for testing the translation quality in a larger scale.

Comment: it's not clear to me what human-gold-standard alignments you are going to use for the intrinsic evaluation. --Nasmith 21:15, 9 October 2011 (UTC)