Berger et al 1996 a maximum entropy approach to natural language processing

From Cohen Courses
Revision as of 23:29, 28 September 2011 by Fkeith (talk | contribs)
Jump to navigationJump to search

Being edited by Francis Keith

Citation

Adam Berger, Stephen Della Pietra, and Vincent Della Pietra; Computational Linguistics, (22-1), March 1996;

Online Version

An online version is located at [1]

Summary

This oft-cited paper explains the concept of Maximum Entropy Models and relates them to natural language processing, specifically as they can be applied to Machine Translation

Explanation and Discussion

Maximum Entropy

The paper goes into a fairly detailed explanation of the motivation behind Maximum Entropy Models. They divide it into 2 sub-problems: Finding facts about the data, and incorporating the facts into the model. These facts are the "features" of the data.

Experiments, Method, and Data

The case-study they introduce in the paper is one involving Machine Translation, translating French sentences to English. The goal of the paper is to use the described maximum entropy model to augment the basic translation model. The model introduces the concept of alignments, which yields both a sequence of words, as well as a mapping from the input sequence to the output sequence.

Translation Model

Model

The model is designed to find:

Where is the best English translation for the French sequence of words . can be defined as a sum of the probabilities of all possible alignments of and .

This is defined as the translation model. Their initial model for computing the probability of an alignment of and is given as:

The first term is the product of the probabilities that a given English word produces French words. The second term is the product that the given English word produces the French word , and the final term is the probability of the ordering of the French words.

The drawback with the model is that it does not use any context in its usage. Their solution is to train a maximum entropy model for each English word such that it produces a French word based on some context . The new model is:

Results