Difference between revisions of "GeneralizedIterativeScaling"

Latest revision as of 00:41, 3 November 2011

This is one of the earliest methods used for inference in log-linear models. Though more sophisticated and faster methods have evolved, this method provides an insight in log linear models.

What problem does it address

The objective of this method is to find a probability function of the form

$(1)\quad \quad p_{i}=\pi _{i}\mu \prod _{s=1}^{d}\mu _{s}^{b_{si}}$

satisfying the constraints

$(2)\quad \quad \sum _{i\in I}b_{si}p_{i}=k_{s}$

where $I$ is an index set; the probability distribution over which has to be determined, $p$ is a probability distribution and $\pi$ is a subprobability function (adds to 1 but $\pi _{i}\neq 0$ for any $i$ ); $b_{si}\neq 0$ is constant.

Since $\log \left({\frac {p_{i}}{\pi _{i}}}\right)$ is linear in $\mu$ and $\mu _{i}$ , $p$ belongs to the log linear family.

Existence of a solution

If $p$ of form (1) exists satisfying (2), then it minimizes

$KL[p,\pi ]=\sum _{i}p_{i}\log \left({\frac {p_{i}}{\pi _{i}}}\right)$

and is unique. Since $\pi _{i}$ are constant; it essentially boils down to the following statement.

Maximum entropy

If there exists a positive probability function of the form

$p_{i}=\mu \prod _{s=1}^{d}\mu _{s}^{b_{si}}$

satisfying (2), then it maximizes the entropy

$H(p)=-\sum _{i\in I}p_{i}log(p_{i})$

This statement is equivalent to saying that if there are a set of features whose expected value is known, then the probability distribution (if there exists one) that maximizes the entropy (makes minimum assumptions) is of the form (1).

Algorithm

Given that constraints (2) is satisfied by atleast one sub-probability function (this condition is also known as consistency of constraints), then (1) and (2) can be expressed as

$(1')\quad \quad p_{i}=\pi _{i}\prod _{r=1}^{c}\lambda _{r}^{a_{ri}}$

$(2')\quad \quad \sum _{i\in I}a_{ri}p_{i}=h_{r},\quad \quad r=1,2,\dots ,c$

where $a_{ri}\geq 0,\quad \sum _{r=1}^{c}a_{ri}=1,\quad h_{r}>0,\quad \sum _{r=1}^{c}h_{r}=1$

Define

$p_{i}^{(0)}=\pi _{i}$

Iterate

$p_{i}^{(n+1)}=p_{i}^{(n)}\prod _{r=1}^{c}\left({\frac {h_{r}}{h_{r}^{(n)}}}\right)^{a_{ri}};$

where

$h_{r}^{(n)}=\sum _{i\in I}a_{ri}p_{i}^{(n)}$

Used in

Frietag 2000 Maximum Entropy Markov Models for Information Extraction and Segmentation

References

J. N. Darroch and D. Ratcliff, Generalized Iterative Scaling for log linear models.

@@ Line 46: / Line 46: @@
 <math> (2') \quad \quad \sum_{i\in I} a_{ri} p_i = h_r, \quad \quad r = 1, 2, \dots, c </math>
+where <math>a_{ri} \geq 0, \quad \sum_{r=1}^c a_{ri} = 1, \quad h_r > 0, \quad \sum_{r=1}^c h_r = 1 </math>
 Define
@@ Line 61: / Line 63: @@
 == Used in ==
-[[Frietag 2000 Maximum Entropy Markov Models for Information Extraction and Segmentation]]
+[[RelatedPaper::Frietag 2000 Maximum Entropy Markov Models for Information Extraction and Segmentation]]
 == References ==
 J. N. Darroch and D. Ratcliff, Generalized Iterative Scaling for log linear models.

Difference between revisions of "GeneralizedIterativeScaling"

Latest revision as of 00:41, 3 November 2011

Contents

What problem does it address

Existence of a solution

Maximum entropy

Algorithm

Used in

References

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools