Difference between revisions of "GeneralizedIterativeScaling"

From Cohen Courses
Jump to navigationJump to search
Line 21: Line 21:
 
If <math>p</math> of form (1) exists satisfying (2), then it minimizes <math>KL[p, \pi]</math> and is unique. Since <math>\pi_i</math> are constant; it essentially boils down to the following statement.  
 
If <math>p</math> of form (1) exists satisfying (2), then it minimizes <math>KL[p, \pi]</math> and is unique. Since <math>\pi_i</math> are constant; it essentially boils down to the following statement.  
  
=== Maximizing Entropy ===  
+
=== Maximum entropy ===
  
 
If there exists a positive probability function of the form  
 
If there exists a positive probability function of the form  
Line 30: Line 30:
  
 
<math> H(p) = - \sum_{i\in I} p_i  log(p_i)</math>
 
<math> H(p) = - \sum_{i\in I} p_i  log(p_i)</math>
 +
 +
This statement is equivalent to saying that if there are a set of features whose expected value is known, then the probability distribution (if there exists one) that maximizes the entropy (makes minimum assumptions) is of the form (1).
 +
 +
== Algorithm ==
 +
 +
Given that constraints (2) is satisfied by atleast one sub-probability function (this condition is also known as consistency of constraints), then (1) and (2) can be expressed as
 +
 +
<math> (1') \quad \quad p_i = \pi_i \prod_{r=1}^{c} \lambda_r^{a_{ri}}</math>
 +
 +
<math> (2') \quad \quad \sum_{i\in I} a_{ri} p_i = h_r, \quad \quad r = 1, 2, \dots, c </math>
 +
 +
Define
 +
 +
<math>p_i^{(0)} = \pi_i</math>
 +
 +
Iterate
 +
 +
<math>p_i^{(n+1)} = p_i^{(n)} \prod_{r=1}^c \left( \frac{h_r}{h_r^{(n)}} \right)^{a_{ri}};</math>
 +
 +
where
 +
 +
<math>h_r^{(n)} = \sum_{i\in I} a_{ri}p_i^{(n)} </math>

Revision as of 10:30, 27 September 2011

This is one of the earliest methods used for inference in log-linear models. Though more sophisticated and faster methods have evolved, this method provides an insight in log linear models.

What problem does it address

The objective of this method is to find a probability function of the form

satisfying the constraints

where is an index set; the probability distribution over which has to be determined, is a probability distribution and is a subprobability function (adds to 1 but for any ); is constant.

Existence of a solution

If of form (1) exists satisfying (2), then it minimizes and is unique. Since are constant; it essentially boils down to the following statement.

Maximum entropy

If there exists a positive probability function of the form

satisfying (2), then it maximizes the entropy

This statement is equivalent to saying that if there are a set of features whose expected value is known, then the probability distribution (if there exists one) that maximizes the entropy (makes minimum assumptions) is of the form (1).

Algorithm

Given that constraints (2) is satisfied by atleast one sub-probability function (this condition is also known as consistency of constraints), then (1) and (2) can be expressed as

Define

Iterate

where