Improving SMT word alignment with binary feedback

From Cohen Courses
Revision as of 17:47, 18 September 2011 by Asaluja (talk | contribs)
Jump to navigationJump to search

Team Member(s)

  • Avneesh Saluja
  • I am more than happy to partner with 1 or 2 other people on this project. Please contact me if you're interested!

Proposal

Word alignment is an important sub-problem within machine translation. It addresses the issue of aligning word or phrase pairs between different languages, which varies from a relatively simple task for languages with similar structure (e.g., English and Spanish) to a fairly difficult problem for other languages, like English-Chinese or English-Japanese. Alignment models are used in the training of SMT systems when extracting phrase pairs from a parallel corpus, as well as in the decoding stage. Hence, it is reasonable to assume that errors in the hypotheses produced by an MT system can often be attributed to errors in the alignment model.

The idea behind this project is to improve SMT performance (as evaluated by BLEU, METEOR, or another end-to-end MT metric) through binary feedback given by a user. In this case, the MT system produces a hypothesis which the user then judges as either a "good translation" or a "bad translation". The challenge is to incorporate this coarse form of feedback into the various models that constitute an MT system. Given our hypothesis above, it makes sense to attempt to correct these errors through adjusting the alignment model.

An initial approach can be based on J-LIS (Joint Learning with Indirect Supervision: "Structured Output Learning with Indirect Supervision", M. Chang et al, ICML 2010). While the particular problem instance in this case is word alignment, a principled approach can be generalized to tackle the broader problem of incorporating binary labeling, online, in structured output predictors.

Dataset(s)

Baseline System

Evaluation

Evaluation for this project will be based on two metrics:

  • Alignment Error Rate (AER): an alignment-specific metric. This metric can only be used if we have annotated data on parallel corpora, specifically which particular words within a particular sentence correspond to a given target word in the target sentence. This is available with Hansards data, but not with IWSLT data.
  • BLEU: a commonly used metric to evaluate machine translation quality

Related Work