Word Alignments

Word alignments are an important notion introduced in Word-based Machine Translation, and are commonly employed in Phrase-based machine translation.

In parallel corpora, sentences in different languages are not aligned word by word but sentence by sentence. Thus, it is not trivial to fragment the sentence pair into smaller translation units. Word alignments map each word in the source sentence $s$ to a equivalent word in the target sentence $t$ . A word alignment between a source sentence $s_{1}^{I}$ and a target sentence $t_{1}^{J}$ , is represented by a alignment matrix $A(i,j)$ , where $A(i,j)=1$ if $s_{i}$ is aligned with $t_{j}$ . Alignments do not have to be one-to-one, since there 2 words in a language can have the same meaning as 3 words in another language. However, early alignment models (Ex: IBM Model 1), define the restriction that each alignment can only be aligned to one source word, defining a one-to-many alignment. In these cases, word alignments can be also be defined by an vector $a_{1}^{J}$ , where $a(j)$ returns the index of the word that $t_{j}$ is aligned to.

Word Alignments

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools