An End-to-End Discriminative Approach to Machine Translation

From Cohen Courses
Jump to navigationJump to search

Citation

paper

Summary

In this Paper, a discriminative approach to learn a translation model from parallel sentences. The translation task is viewed as the problem of finding the derivation h that mazimizes the translation score from the source s and target t. This score is calculated as a weighted feature combination, which is one of the main contributions of this paper. Another major contribution is the parameter training method which is performed using a weighted perceptron algorithm. In this aspect, the paper shows that updating parameters locally so that no radical changes are made to the current translation in each step performs better than radically changing the translation to correspond to the reference, in each update.

Translation Model

The model used in this work is typically used in statatistical Machine Translation, where we view the translation task as a structured prediction problem, where each source segment is mapped to a target segment following a given model. In the case of MT, a hidden variable h is introduced which describes the sequence of derivations used to translate each source phrase in the source sentence s to a target phrase, leading to a translation t. Thus, we want to find the derivation h (hence the translation t) that maximizes the model's score:

The score given by is given by the dot product of 2 components, the feature vector and the weight vector w.

Perceptron-based Training

Training a good set of weights w is a challenging task in MT. This is because unlike POS taggers, there are multiple correct translations for each source sentence. This work proposes a weighted perceptron approach to find a set of weights using a parallel data set. The adopted update rule is as follows:

where the is the score of the reference translation, and is the score using the current set of weights.

Thus, at each update, we would like update the parameters so that the model's output leans towards , the reference. One of the main contributions of this work is to study different strategies for this update. The simples approach, which was shown to not be very effective, is to chose the update option with the highest score , where the possible derivations h are restricted to the derivations that output the reference. This is called Bold updating. Another, possible approach is the use the current parameters to generate an n-best list of possible translations and choose the option with the highest BLEU score. This is called Local updating. Finally, the hybrid updating appraoch combines these approaches by using the bold appraoch only when the reference translation is within the n-best list.

Experimental results show that the BLEU score is higher using the Local update strategy with the BLEU score of 34.7, and other approaches perform less well with decreases of approximately 1 BLEU point.

Feature Space

Up until now we talked about how to train the weight parameters, which weight each of the features. The second main contribution of this work is to propose features that contribute to a better translation quality. These are distributed into the following categories:

  • Blanket features - These features include the translation probability and language model probability that are commonly used in the SMT field.
  • Lexical features - Lexical features are a more fine grained features to fix some common problems with the previous features, such as the presence of some n-grams that are generally translated spuriously.
  • POS features - These features are used to identify good translations from bad translations based on the POS tag sequence of the translation.
  • Alignment Constellation features - These features identify how good a phrase pair is given their alignments. Specific alignment formations are named constellations and are identified by these features.

Experimental results show that the based line result using only Blanket features, 28.4, can be improved to 29.2 by adding lexical features and further to 29.6, by adding POS on the top of the previous ones.

Finally, minor improvements are achieved by adding the Alignment Constellation features.

Related Work