Lacoste-Julien et al, NAACL 2006

From Cohen Courses
Jump to navigationJump to search

Being edited by Rui Correia


Simon Lacoste-Julien, Ben Taskar, Dan Klein, and Michael I. Jordan. Word alignment via quadratic assignment. In Human Language Technology–North American Association for Computational Linguistics, New York, NY, 2006. On-line


In this paper the authors address two major limitations of Taskar et al. (2005) paper on discriminative word alignment methods: fertility and first-order interactions.

The authors propose methods to extend that discriminative approach to word alignment in order to allow fertility modeling and to capture first-order interactions between alignments of consecutive words. By allowing to capture phenomena such as monotonicity, local inversion and contiguos fertility trends (highly informative for alignment) the authors enhance the expressive power of the baseline discriminative approach, while remaining computationally efficient.

Their best result consists of a relative AER reduction of 25 % over the baseline discriminative formulation. The authors claim their contribution to be particularly promising for phrase-based systems since they perform better with higher recall on the task.

Baseline Method

The baseline method used in this paper, as already mentioned, is a discriminative approach to word alignments by Taskar et al. (2005). In this model, nodes and correspond to words in the source and target language, respectively, and edges correspond to alignments between words. The edge weights represent the degree to which word in one sentence can be translated using word in the other sentence. The predicted alignments are then chosen by maximizing the sum of the edge scores, which can be formulated in linear programming as:


where are relaxations of the binary variables that indicate if is assigned to .


In the baseline model, a word could only align to at most one word in the translation. This limitation implies that cases such as backbone and épine dorsal (in French) can not be inferred. To solve this problem, one could simply increase the right hand side for the constraints in the baseline model from to , where is the maximum allowed fertility. However, this would cause that maximum weight solutions would have either all words with fertility or .

Instead, the authors defined a means of encouraging the common case of low fertility, allowing only higher fertilities when it is licensed, introducing penalties in the model for those high fertilities. Those penalties are modeled introducing variables and (meaning that node (or ) has fertility of at least ). Subsequently, the model formulation in linear programming comes as:


where represents the penalty for fertility of node , and each is the penalty increment from increasing the fertility from to .

First-Order Interaction

The second major limitation of the baseline model this paper addresses is the fact that the edges only interact indirectly through the competition induced by the constraints, in contrast with directly modeling correlations between alignments of consecutive words (such as in Vogal et al, COLING 1996).

The authors suggest this can be accomplished by modeling the common local alignment configurations by adding bonuses for pairs of edges, considering three examples:

  • encourage strictly monotonic alignments
  • local inversion of nouns and adjectives
  • a word in one language can be translated as a phrase in the other language

Formally, variables named , along with correspondent scores , are added to the model indicating whether both edges and are in the alignment. Additionally, the pair of constraints and is added to reinforce the semantics . In linear programming:


where .

Experimental Results

The authors used the English-French Hansards dataset to apply their models, comparing the baseline method with successive augmentations, achieving a 25% AER improvement over the baseline. The figure below show the Precision, Recall and Alignment Error Rate for the baseline model alone (M), baseline and fertility (M + F), baseline and interactions (M +Q), baseline, fertility and interactions (M + F + Q) and adding the IBM Model 4 predictions (M + F + Q + IBM4).