Difference between revisions of "Bansal et al, ACL 2011"

Revision as of 17:15, 25 October 2011

Note

still incomplete...

Citation

M. Bansal, C. Quirk, and R. Moore. 2011. Gappy phrasal alignment by agreement. In Proceedings of ACL.

Online version

Summary

This work defines a phrase-to-phrase alignment model for Statistical Machine Translation. A model based on HMMs is defined based on the work presented in Vogal et al, COLING 1996, and extending it to allow continuous and discontinuous phrases (gappy phrases).

The quality of the alignments is further improved by employing alignment agreement described in [Liang and al, 2006], where bidirectional alignments are trained with a joint objective function, rather than using Symmetrization.

Experimental results show improvements in terms of AER (Alignment Error Rate) over the work in [Liang and al, 2006]. As for translation quality, it was evaluated using BLEU and showed improvements over the same baseline.

Description of the Method

An extension of the work in Vogal et al, COLING 1996 is described, where a word to phrase alignment model was presented. Two extensions to this model are proposed.

The first extension is to allow phrasal alignments, where multiple source words can be aligned with multiple target words. This makes the model Semi-Markov, since each state (alignment between phrases) can emit more than one observation (target word) at each timestamp, as opposed to the previous work using regular HMM, where each target word can be aligned with at most one source word.

The second extension allows alignments using phrases with gaps to be modeled, where a phrase with a gap is the sequence $w_{s}*w_{f}$ , where $w_{s}$ is the starting word and $w_{f}$ is the final word and "*" can be any number of words. Furthermore, the alignment agreement word presented in [Liang and al, 2006] was employed and extended to the new space of alignments (alignments including gappy phrases), to substantially reduce overfitting.

Thus, the generative model takes the following form:

$p(A,L,O|S)=p_{l}(J|I)p_{f}(K|J)\prod _{k=1}^{K}p_{j}(a_{k}|a_{k-1})p_{t}(l_{k},o_{l_{k-1}+1}^{l_{k}}|S[a_{k}],l_{k-1})$

Where $p_{l}(J|I)$ is a uniform distribution modeling the length of the observation sequence $J$ based on the number of words in the state-side (source words).

$p_{f}(K|J)$ is a distribution to model the number of states given the number of observation words (in another words, how target words are grouped into phrases). This distribution is modeled by $\eta ^{(J-K)}$ , where a penalty parametrized by $\eta$ is given to shorter state sequences with long phrases, since the number of phrases $K$ is much smaller than the number of target words $J$ .

$p_{j}(a_{k}|a_{k-1})$ is a probability distribution for state transitions with a first-order Markov assumption.

Finally, $p_{t}(l_{k},o_{l_{k-1}+1}^{l_{k}}|S[a_{k}],l_{k-1}$ is the translation probability of the target phrase $o_{l_{k-1}+1}$ starting in position $l_{k-1}+1$ and ending in position $l_{k}$ , given the previous phrase ending in position $l_{k-1}$ , the aligned source phrase $S[a_{k}]$ . The alignment variable $a$ is defined as $(i,j,g)$ , where i and j are the starting and ending positions of the words the target phrase is aligned to, and g defines whether the source phrase from i to j is a continuous or gappy phrase. For instance, the phrase $S[2,4,CONTIG]$ can represent the phrase "ne peux pas", while the phrase $S[2,4,GAP]$ represents "ne * pas", where "*" is a gap.

Experimental Results

Tests were conducted by evaluating the quality of the produced alignments using AER (Alignment Error Rate) and on the translation quality using BLEU.

2 datasets were used. For the English-French pair, the Hansards dataset was used, which contains around 1.1 million training sentence pairs and the system was tested using the NAACL 2003 shared-task dataset. The EUROPARL German-English data was also used, which contains around 1.6 millions training sentences, and the translation quality was evaluated using the WMT2010 translation task data.

The baseline used for this work is the system described in [Liang and al, 2006].

In terms of AER, the inclusion of contiguous segments showed consistent improvements, and some additional gains are observed by including gappy phrases. This is observed using both Posterior and Viterbi decoding to perform inference over expectations.

Corpora	Language Pair	Words	Vocabulary	Description
Avalanche Bulletins	French-German	French:62849 German:44805	French:1993 German:2265	Avalanche Bulletins published by the Swiss Federal Institute for Snow and Avalanche Research
Verbmobil	Spanish-English	Spanish:13768 English:15888	Spanish2008, English:1830	Spontaneous spoken dialog in the domain of appointment scheduling
EuTrans	German-English	German:150279, English:154727	German:4017, English:2443	Typical phrases in the tourism and travel domain

In terms of BLEU, consistent improvements can also be observed using the alignments with gappy phrases.

Related Work

The work in Marcus and Wong, EMNLP 2002, describes a joint probability distribution, which is used and extended in this work.

@@ Line 46: / Line 46: @@
 The baseline used for this work is the system described in [[http://dl.acm.org/ft_gateway.cfm?id=1220849&type=pdf&CFID=49698289&CFTOKEN=66367019 Liang and al, 2006]].
+In terms of AER, the inclusion of contiguous segments showed consistent improvements, and some additional gains are observed by including gappy phrases. This is observed using both Posterior and Viterbi decoding to perform inference over expectations.
+{| class="wikitable" border="1"
+|-
+! Corpora
+! Language Pair
+! Words
+! Vocabulary
+! Description
+|-
+| [[UsesDataset::Avalanche Bulletins]]
+| French-German
+| French:62849 German:44805
+| French:1993 German:2265
+| Avalanche Bulletins published by the Swiss Federal Institute for Snow and Avalanche Research
+|-
+| [[UsesDataset::Verbmobil]]
+| Spanish-English
+| Spanish:13768 English:15888
+| Spanish2008, English:1830
+| Spontaneous spoken dialog in the domain of appointment scheduling
+|-
+| [[UsesDataset::EuTrans]]
+| German-English
+| German:150279, English:154727
+| German:4017, English:2443
+| Typical phrases in the tourism and travel domain
+|}
+In terms of BLEU, consistent improvements can also be observed using the alignments with gappy phrases.
 == Related Work ==
 The work in [[Marcus and Wong, EMNLP 2002]], describes a joint probability distribution, which is used and extended in this work.

Difference between revisions of "Bansal et al, ACL 2011"

Revision as of 17:15, 25 October 2011

Contents

Note

Citation

Online version

Summary

Description of the Method

Experimental Results

Related Work

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools