Improving SMT word alignment with binary feedback

From Cohen Courses
Revision as of 00:13, 12 September 2011 by Asaluja (talk | contribs) (Created page with 'Word alignment is an important sub-problem within machine translation. It addresses the issue of aligning word or phrase pairs between different languages, which varies from a r…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Word alignment is an important sub-problem within machine translation. It addresses the issue of aligning word or phrase pairs between different languages, which varies from a relatively simple task for languages with similar structure (e.g., English and Spanish) to a fairly difficult problem for other languages, like English-Chinese or English-Japanese. Alignment models are used in the training of SMT systems when extracting phrase pairs from a parallel corpus, as well as in the decoding stage. Hence, it is reasonable to assume that errors in the hypotheses produced by an MT system can often be attributed to errors in the alignment model.

The idea behind this project is to improve SMT performance (as evaluated by BLEU, METEOR, or another end-to-end MT metric) through binary feedback given by a user. In this case, the MT system produces a hypothesis which the user then judges as either a "good translation" or a "bad translation". The challenge is to incorporate this coarse form of feedback into the various models that constitute an MT system. Given our hypothesis above, it makes sense to attempt to correct these errors through adjusting the alignment model.

An initial approach can be based on J-LIS (Joint Learning with Indirect Supervision: "Structured Output Learning with Indirect Supervision", M. Chang et al, ICML 2010). While the particular problem instance in this case is word alignment, a principled approach can be generalized to tackle the broader problem of incorporating binary labeling, online, in structured output predictors.