Difference between revisions of "Ravi and Knight, ACL 2011"
Line 13: | Line 13: | ||
This is done by looking at the Machine Translation task from the decipherment perspective, where a sentence in the source language is viewed as the sentence target, but encoded in some unknown symbols. | This is done by looking at the Machine Translation task from the decipherment perspective, where a sentence in the source language is viewed as the sentence target, but encoded in some unknown symbols. | ||
− | Experimental showed that, while the results using monolingual data were considerably lower than those using bilingual data if the same amount of data is used, large amounts of monolingual data can be used to create models that perform similarly to systems that use smaller amounts of bilingual data. This is encouraging, since bilingual data is a scarce resource for most language pairs. | + | Experimental showed that, while the results using monolingual data were considerably lower than those using bilingual data if the same amount of data is used, large amounts of monolingual data can be used to create models that perform similarly to systems that use smaller amounts of bilingual data. This is encouraging, since bilingual data is a scarce resource for most language pairs and domains. |
== Description of the Method == | == Description of the Method == |
Revision as of 16:59, 26 October 2011
Contents
Citation
S. Ravi and K. Knight. 2011. Deciphering Foreign Language. In Proceedings of ACL.
Online version
Summary
This work addresses the Machine Translation problem without resorting to parallel training data.
This is done by looking at the Machine Translation task from the decipherment perspective, where a sentence in the source language is viewed as the sentence target, but encoded in some unknown symbols.
Experimental showed that, while the results using monolingual data were considerably lower than those using bilingual data if the same amount of data is used, large amounts of monolingual data can be used to create models that perform similarly to systems that use smaller amounts of bilingual data. This is encouraging, since bilingual data is a scarce resource for most language pairs and domains.