Difference between revisions of "Contrastive Estimation"

Revision as of 17:18, 29 September 2011

This is a method proposed by Smith and Eisner 2005:Contrastive Estimation: Training Log-Linear Models on Unlabeled Data.

The proposed approach deals with the estimation of log-linear models (e.g. Conditional Random Fields) in an unsupervised fashion. The method focuses on the denominator $\sum _{x\prime ,y\prime }p(x\prime ,y\prime )$ of the log-linear models by exploiting the so called implicit negative evidence in the probability mass.

Motivation

In the Smith and Eisner (2005) paper, the authors have surveyed different estimation techniques (See the Figure above) for probabilistic graphic models. It is clear that for HMMs, people usually optimize the joint likelihood. For log-linear models, various methods were proposed to optimize the conditional probabilities. In addition to this, there are also methods to directly maximize the classification accuracy, the sum of conditional likelihoods, or expected local accuracy. However, none of the above estimation techniques have specifically focused on the implicit negative evidence in the denominator of the standard log-linear model in an unsupervised setting.

How it Works

Unlike the above methods, the contrastive estimation approach optimizes:

\prod _{i}p(X_{i}=x_{i}|X_{i}\in Neighbor(x_{i}),\theta )

here, the $Neighbor(x_{i})$ function means a set of implicit negative examples and the $x_{i}$ itself. The idea here is to move the probability mass from the neighborhood of $x_{i}$ to $x_{i}$ itself, so that a good denominator in log-linear models can not only improve the task accuracy, but also reduce the computation of the normalization part of the model.

Problem Formulation and the Detailed Algorithm

Assume we have a log-linear model that is paratermized by $\theta$ , the input example is $x$ , and the output label is $y$ . A standard log-liner model takes the form

p(x,y|\theta ){\overset {\underset {\mathrm {def} }{}}{=}}{\frac {1}{Z(\theta )}}\exp(\theta \cdot f(x,y))

here, we can use $u$ to represent the unnormalized score $\exp(\theta \cdot f(x,y))$ . $Z(\theta )$ is the partition function, and is hard to compute (much larger space). Then, we can represent the objective function as

\prod _{i}{\frac {\sum _{(x,y)\in A_{i}}u(x,y|\theta )}{\sum _{(x,y)\in B_{i}}u(x,y|\theta )}}

where $A_{i}\subset B_{i}$ for each $i$ .

In the unsupervised setting, the contrastive estimation method maximizes

\log \prod _{i}{\frac {\sum _{(y\in Y)}u(x,y|\theta )}{\sum _{(x,y)\in Neighbor(x_{i})\times Y}u(x,y|\theta )}}

@@ Line 28: / Line 28: @@
 In the unsupervised setting, the contrastive estimation method maximizes
-:<math> \log \prod_{i} \frac{\sum_{(y \in Y)} u (x,y|\theta)}{\sum_{(x,y)\in Neighbor (x_i) \times } u (x,y|\theta)}</math>
+:<math> \log \prod_{i} \frac{\sum_{(y \in Y)} u (x,y|\theta)}{\sum_{(x,y)\in Neighbor (x_i) \times Y} u (x,y|\theta)}</math>
 == Some Reflections ==
 == Related Papers ==

Difference between revisions of "Contrastive Estimation"

Revision as of 17:18, 29 September 2011

Contents

Motivation

How it Works

Problem Formulation and the Detailed Algorithm

Some Reflections

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools