Difference between revisions of "Ravi and Knight, ACL 2011"

From Cohen Courses
Jump to navigationJump to search
Line 33: Line 33:
 
where P(t) is the probability of a target sentence <math>t</math>, modeled by the a language model. The large number of possible latent variable that is generated from this model is tacked using [[usesMethod::Gibbs sampling]].
 
where P(t) is the probability of a target sentence <math>t</math>, modeled by the a language model. The large number of possible latent variable that is generated from this model is tacked using [[usesMethod::Gibbs sampling]].
  
As for the translation model <math>P_\theta (s,a|t)</math>, two models are presented.
+
As for the translation model <math>P_\theta (s,a|t)</math>, two models are presented. The first is a simple model that accounts for word substitutions, insertions, deletions and local reordering, but does not incorporate word fertility and global reordering as in IBM Model 3.
 
== Experimental Results ==
 
== Experimental Results ==
  
 
== Related Work ==
 
== Related Work ==

Revision as of 21:03, 30 October 2011

Citation

S. Ravi and K. Knight. 2011. Deciphering Foreign Language. In Proceedings of ACL.

Online version

pdf

Summary

This work addresses the Machine Translation problem without resorting to parallel training data.

This is done by looking at the Machine Translation task from the decipherment perspective, where a sentence in the source language is viewed as the sentence target, but encoded in some unknown symbols.

Experimental showed that, while the results using monolingual data were considerably lower than those using bilingual data if the same amount of data is used, large amounts of monolingual data can be used to create models that perform similarly to systems that use smaller amounts of bilingual data. This is encouraging, since bilingual data is a scarce resource for most language pairs and domains, while monolingual data is much more abundant.

Description of the Method

Word Alignments using parallel corpora is viewed as a maximization problem with latent word alignments for a set of sentence pairs , given by:

where are the translation parameters of the model.

When only monolingual corpora is used, for each source sentence , there isn't an exact target sentence that is aligned to the source sentence. Thus, like the word alignments, this work views the hidden target sentence as an additional latent variable. Hence, the previous equation can be rewritten as:

where P(t) is the probability of a target sentence , modeled by the a language model. The large number of possible latent variable that is generated from this model is tacked using Gibbs sampling.

As for the translation model , two models are presented. The first is a simple model that accounts for word substitutions, insertions, deletions and local reordering, but does not incorporate word fertility and global reordering as in IBM Model 3.

Experimental Results

Related Work