Difference between revisions of "Lau et al HLT 1993"

From Cohen Courses
Jump to navigationJump to search
Line 40: Line 40:
 
</math>
 
</math>
  
where the i ’s are some unknown constants, to be found. To search the exponential family defined by (18) for the i’s that will make P(x) satisfy all the constraints, an iterative algorithm, “Generalized Iterative Scaling” (GIS, [Darroch and Ratcliff 72]), exists, which is guaranteed to converge to the solution.  
+
where the <math>\mu_i</math>’s are some unknown constants, to be found.  
  
Trigger pairs are formulated as the empirical expectation of a constraints function fA->B as:
+
To search the exponential family defined by for the <math>\mu_i</math>’s that will make <math>P(x)</math> satisfy all the constraints, the authors used an iterative algorithm, [[method::Generalized Iterative Scaling]], which is guaranteed to converge to the solution.
 +
 
 +
Trigger pairs are formulated as the empirical expectation of a constraint function <math>f_{A \rarr B} </math> as:
 +
 
 +
<math>
 +
f_{A \rarr B}  (h,w) =
 +
\begin{cases}
 +
1,  & \mbox{if }A \in h, w= B\\
 +
0, & \mbox{otherwise}
 +
\end{cases}
 +
</math>
  
 
To incorporate the previous static model the authors formulated constraint functions to fit the ML/ME paradigm as:
 
To incorporate the previous static model the authors formulated constraint functions to fit the ML/ME paradigm as:

Revision as of 21:29, 26 September 2011

Citation

Raymond Lau, Ronald Rosenfeld and Salim Roukos. Adaptive Language Modeling Using the Maximum Entropy Principle. In Proceedings of the ARPA Human Language Technology Workshop, published as Human Language Technology, pages 108–113. Morgan Kaufmann, March 1993.

Online version

ACL WEB

Summary

In this paper the authors focus on the development of Language Models using Maximum Entropy principles, in order to combine evidence from multiple sources (for example: trigrams and long distance triggers).

The Problem

State of the art language model was a trigram model - perplexity of 772 (prob of a word, based on the two word preceeding it). ==> Static model, not able to adapt to style and topic of the document

Adaptive model ==> changes estimates as a result of "seeing" some of the text

  • process a large heterogeneous data source
  • trained on data from one domain, can be used in another domain

Use trigger pairs: if a word sequence A is significantly correlated with another word sequence B (A->B) this is considered a trigger pair.

Triggerpairs.png

Given the document that was processed so far (h) and a word considered for the next position (w), there are many different estimates P(w|h), derived from the various triggers. How to combine them?

The Solution

The Maximum Entropy Principle is a common technique used to combine several different knowledge sources in a combined estimate. This method reformulates the different estimates as constraints on the expectation of various functions to be satisfied by the combined estimante, ending up choosing, among the probability distributions that satisfy these constraints, the one with the highest entropy.

Given a general event space , to derive a combined probability function , each constraint is associated with a constraint function and a desired expectation . The constraint is then written as:

Given consistent constraints, a unique ME solution is guaranteed to exist, and to be of the form:

where the ’s are some unknown constants, to be found.

To search the exponential family defined by for the ’s that will make satisfy all the constraints, the authors used an iterative algorithm, Generalized Iterative Scaling, which is guaranteed to converge to the solution.

Trigger pairs are formulated as the empirical expectation of a constraint function as:

To incorporate the previous static model the authors formulated constraint functions to fit the ML/ME paradigm as: