Difference between revisions of "Daume ICML 2009"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
A [[Category::Paper|paper]] on using unsupervised [[UsesMethod::SEARN|SEARN]]
 
 
 
In progress by [[User:Fkeith|Francis Keith]]
 
In progress by [[User:Fkeith|Francis Keith]]
  
Line 17: Line 15:
 
== Algorithm ==
 
== Algorithm ==
  
The basic [[UsesMethod::SEARN|SEARN]] algorithm is described.  
+
The basic [[UsesMethod::SEARN|SEARN]] algorithm is described (see the SEARN wiki article for more background).  
  
 
In the supervised form, the algorithm uses a sample space of <math>(x,y)</math> pairs, where <math>x</math> is the input and <math>y</math> is the true output. In the unsupervised case, we must account for the fact that the algorithm must be run on an input of simply <math>x</math>, with the classifier still producing <math>y</math>.
 
In the supervised form, the algorithm uses a sample space of <math>(x,y)</math> pairs, where <math>x</math> is the input and <math>y</math> is the true output. In the unsupervised case, we must account for the fact that the algorithm must be run on an input of simply <math>x</math>, with the classifier still producing <math>y</math>.
  
 
The proposed solution is to essentially predict <math>y</math>, and then perform the normal prediction. The loss function will only be dependent on <math>x</math>.
 
The proposed solution is to essentially predict <math>y</math>, and then perform the normal prediction. The loss function will only be dependent on <math>x</math>.
 +
 +
Given an input <math>x</math>, which is a sequence of length <math>T</math>, we consider the true output to be of length <math>2T</math>, where the first <math>T</math> components will be sampled from the possible vocabulary for <math>y</math>, and represent the latent structure, and the last <math>T</math> components are the true input <math>x</math>. We can then use [[UsesMethod::SEARN|SEARN]] on this input.
 +
 +
In dealing with <math>\pi*</math>, the optimal (and thus initial) policy, it already predicts fully the true input, and it can arbitrarily/randomly predict the latent input. This gives us the ability to run the first iteration of [[UsesMethod::SEARN|SEARN]]

Revision as of 22:31, 29 September 2011

In progress by Francis Keith

Citation

"Unsupervised Search-based Structured Prediction", Hal Daume III, ICML 2009

Online Version

An online version of the paper can be found here [1]

Summary

This paper details a methodology for unsupervised SEARN. It compares the results to other methods, first on synthetic data, and then on other methods for unsupervised dependency parsing.

Algorithm

The basic SEARN algorithm is described (see the SEARN wiki article for more background).

In the supervised form, the algorithm uses a sample space of pairs, where is the input and is the true output. In the unsupervised case, we must account for the fact that the algorithm must be run on an input of simply , with the classifier still producing .

The proposed solution is to essentially predict , and then perform the normal prediction. The loss function will only be dependent on .

Given an input , which is a sequence of length , we consider the true output to be of length , where the first components will be sampled from the possible vocabulary for , and represent the latent structure, and the last components are the true input . We can then use SEARN on this input.

In dealing with , the optimal (and thus initial) policy, it already predicts fully the true input, and it can arbitrarily/randomly predict the latent input. This gives us the ability to run the first iteration of SEARN