Difference between revisions of "Generalized Iterative Scaling"

Revision as of 20:41, 29 September 2011

The method

The Generalized Iterative Scaling (GIS) is a method that searches the exponential family of a Maximum Entropy solution of the form:

$P(x)=\prod _{i}\mu _{i}^{f_{i}(x)}$

where the $\mu _{i}$ 's are some unknown constants to be found. The $\mu _{i}$ 's of the solution would be such that will make $P(x)$ satisfy all the constraints $K_{i}$ , of the equation:

$E_{P}f_{i}{\overset {\underset {\mathrm {def} }{}}{=}}\sum _{x}P(x)f_{i}(x)=K_{i}$

The Algorithm

GIS starts with arbitrary $\mu _{i}^{(0)}$ values, wich define the initial probability estimate:

$P^{(0)}(x){\overset {\underset {\mathrm {def} }{}}{=}}\prod _{i}\mu _{i}^{(0)f_{i}(x)}$

Each next iteration is intended to create an estimate, that will match the constraints better than the last one. Each $j$ iteration follows the steps:

(1) Compute the expectations of all the $f_{i}$ 's under the current estimate function, i.e., $\sum _{x}P^{(j)}(x)f_{i}(x)$

(2) Compare the present values with the desired ones, updating the $\mu _{i}$ 's in the following manner:

$\mu _{i}^{(j+1)}=\mu _{i}^{(j)}.{\frac {K_{i}}{E_{P^{(j)}}f_{i}}}$

(3) Set the new estimate function:

$P^{(j+1)}{\overset {\underset {\mathrm {def} }{}}{=}}\prod _{i}\mu _{i}^{(j+1)f_{i}(x)}$

(4) If convergence or near-convergence is reached stop; otherwise go back to step (1)

Intrinsic characteristics

GIS has three advantages when compared to other methods: it is able to incorporate feature selection, scales up well in number of features and is resilient to feature dependence.

On the other hand GIS has problems with smoothing and is relatively slow in training when compared to other classification methods.

Related Papers

John N. Darroch, and Douglas Ratcliff. (1972). "Generalized Iterative Scaling for Log-Linear Models." In: The Annals of Mathematical Statistics, 43(5). [1]

Adwait Ratnaparkhi. (1996). "A Maximum Entropy Model for Part-of-Speech Tagging." In: Proceedings of EMNLP Conference (EMNLP 1996). [2]

Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). "Maximum Entropy Markov Models for Information Extraction and Segmentation." In: Proceedings of ICML-2000. [3]

@@ Line 1: / Line 1: @@
 == The method ==
 The Generalized Iterative Scaling (GIS) is a [[Category::method]] that searches the exponential family of a Maximum Entropy solution of the form:
@@ Line 10: / Line 9: @@
 <math>
-\sum_x P(x)f_i(x) = K_i
+E_Pf_i \overset{\underset{\mathrm{def}}{}}{=}  \sum_x P(x)f_i(x) = K_i
 </math>
@@ Line 22: / Line 21: @@
 Each next iteration is intended to create an estimate, that will match the constraints better than the last one. Each <math>j</math> iteration follows the steps:
-*(1) Compute the expectations of all the <math>f_i</math>'s under the current estimate function, i.e. <math>\sum_x P^{(j)}(x)f_i (x)</math>
+*(1) Compute the expectations of all the <math>f_i</math>'s under the current estimate function, i.e., <math>\sum_x P^{(j)}(x)f_i (x)</math>
-the circumstances under which it is meant to be used
+*(2) Compare the present values with the desired ones, updating the <math>\mu_i</math>'s in the following manner:
+<math>\mu_i ^{(j+1)} = \mu_i ^{(j)} . \frac {K_i}{E_{P^{(j)}}f_i}</math>
-you are expected to explain clearly what the method is
+*(3) Set the new estimate function:
- and list papers that use it
+<math>P^{(j+1)} \overset{\underset{\mathrm{def}}{}}{=} \prod_i \mu_i ^{(j+1)f_i(x)}</math>
-things the method is comparable to.
-Explain what motivations or assumptions underlie the method
+*(4) If convergence or near-convergence is reached stop; otherwise go back to step (1)
 == Intrinsic characteristics ==
-GIS has three advantages when compared to other methods: it is able to incorporate feature selection, scales up well in numbers of features and is resilient to feature dependence.
+GIS has three advantages when compared to other methods: it is able to incorporate feature selection, scales up well in number of features and is resilient to feature dependence.
-On the other hand GIS has problems with smoothing and is relatively slow in training when compared to other classification methods
+On the other hand GIS has problems with smoothing and is relatively slow in training when compared to other classification methods.
 == Related Papers ==
+John N. Darroch, and Douglas Ratcliff. (1972). "Generalized Iterative Scaling for Log-Linear Models." In: The Annals of Mathematical Statistics, 43(5). [http://www.cs.nyu.edu/~roweis/csc412-2006/extras/gis.pdf]
+Adwait Ratnaparkhi. (1996). "A Maximum Entropy Model for Part-of-Speech Tagging." In: Proceedings of EMNLP Conference (EMNLP 1996). [http://acl.ldc.upenn.edu/W/W96/W96-0213.pdf]
+Andrew McCallum, Dayne Freitag, and Fernando Pereira. (2000). "Maximum Entropy Markov Models for Information Extraction and Segmentation." In: Proceedings of ICML-2000. [http://www.cs.umass.edu/~mccallum/papers/memm-icml2000.ps]

Difference between revisions of "Generalized Iterative Scaling"

Revision as of 20:41, 29 September 2011

Contents

The method

The Algorithm

Intrinsic characteristics

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools