Machine Transliteration
Contents
Citation
"Machine Transliteration", K. Knight and J. Graehl, CL 1998
Online Version
An online version of the paper is available here [1]
Summary
This paper examines using FSTs to solve the problem of transliteration in machine translation. Transliteration is the process of translating proper names and technical terms. In some cases, this is easier than others. The paper specifically examines Japanese-English transliteration.
The Problem
Japanese employs a very different phonetic alphabet from English. However, in the case of proper names, this often means doing a conversion from the English name into a more Japanese pronunciation. One example of this is that Japanese has no differentiation between 'L' and 'R', or 'F' and 'H'. While this may be easy in English-to-Japanese transliteration, it is significantly more difficult and less forgiving to do Japanese-to-English transliterations. They also deal with hand-written text, so OCR errors are introduced.
The Method
They divide the problem up into 5 steps, each of which can be defined as a probabilistic model:
- The English text is written. This is
- The English text is pronounced (as it is a pronunciation mapping in transliteration). This is the probability of a pronunciation given the word, or
- The pronunciation is changed to use Japanese pronunciation sounds. This is the probability of a set of sounds from the Japanese language, given the english, or
- The sounds are converted to katakana (the alphabet used for foreign or technical words). This is the probability of the Japanese text given the sounds, or
- The katakana is written. This is the probability of a given handwritten phrase in katakana given the proper phrase, or
is given as a standard WFSA, while the other probabilities are given by WFSTs. These models are then composed together to produce a large WFST for doing transliteration.
- English Word Sequences
They built this model off a simple unigram probability. They also had a separate model for personal names alone.
- English Words to English Sounds
They built this WFST from the CMU pronunciation dictionary.
- English Sounds to Japanese Sounds
This WFST was built using a dictionary of 8,000 Japanese to English pronunciation pairs. They used EM to train the alignments from each English pronunciation symbol to one or more Japanese pronunciation symbols. They specifically avoided computing cases where the English symbol aligned with no Japanese pronunciation symbols, because it caused the computation time to increase significantly, and introduces more potentially harmful alignments.
- Japanese Sounds to Katakana Words
This was a manually produced WFST, based on both corpus knowledge and knowledge from a Japanese textbook.
- Katakana Text to Handwritten Katakana
This was trained, again using EM, on 19,500 instances of katakana handwritten characters mapped with their output from an OCR system.
Experiments
They run the composed model on 2 experiments. The first (which they don't give results for) is on a set of 222 phrases that were missing from a bilingual dictionary. The second task is run ignoring the OCR aspect, and using only the personal name WFSA. They produced the transliteration for 100 U.S. politicians (i.e. the English-to-Japanese transliteration), and then tested their system on the Japanese-to-English transliteration. They compared it to 4 native English speakers (who were politically aware).
Human | System | |
---|---|---|
Correct | 27% | 64% |
Phonetically correct (misspelled) | 7% | 12% |
Incorrect | 66% | 24% |
The system vastly outperformed the humans. In addition, the authors surmised that improving the language model () would be able to fix many of the errors they were seeing.
Related Work
This was one of the earlier uses of applying WFSTs to machine translation.
Some other work:
- Training Tree Transducers, J. Graehl, K. Knight, NAACL-HLT, 2004 - This paper expands upon using a finite-state transducer, and instead exploits a tree structure to allow for reordering
- Graphical Models over Multiple Strings, M. Dreyer and J. Eisner, EMNLP 2009 - This paper is not a MT paper, but it does involve using WFSTs as factors within a Markov Random Field. This could be useful as a model for translation.
- Parameter Estimation for Probabilistic Finite-State Transducers, J. Eisner, ACL 2002 - A more general WFST description