Difference between revisions of "Daume ICML 2009"
Line 1: | Line 1: | ||
− | |||
− | |||
In progress by [[User:Fkeith|Francis Keith]] | In progress by [[User:Fkeith|Francis Keith]] | ||
Line 17: | Line 15: | ||
== Algorithm == | == Algorithm == | ||
− | The basic [[UsesMethod::SEARN|SEARN]] algorithm is described. | + | The basic [[UsesMethod::SEARN|SEARN]] algorithm is described (see the SEARN wiki article for more background). |
In the supervised form, the algorithm uses a sample space of <math>(x,y)</math> pairs, where <math>x</math> is the input and <math>y</math> is the true output. In the unsupervised case, we must account for the fact that the algorithm must be run on an input of simply <math>x</math>, with the classifier still producing <math>y</math>. | In the supervised form, the algorithm uses a sample space of <math>(x,y)</math> pairs, where <math>x</math> is the input and <math>y</math> is the true output. In the unsupervised case, we must account for the fact that the algorithm must be run on an input of simply <math>x</math>, with the classifier still producing <math>y</math>. | ||
The proposed solution is to essentially predict <math>y</math>, and then perform the normal prediction. The loss function will only be dependent on <math>x</math>. | The proposed solution is to essentially predict <math>y</math>, and then perform the normal prediction. The loss function will only be dependent on <math>x</math>. | ||
+ | |||
+ | Given an input <math>x</math>, which is a sequence of length <math>T</math>, we consider the true output to be of length <math>2T</math>, where the first <math>T</math> components will be sampled from the possible vocabulary for <math>y</math>, and represent the latent structure, and the last <math>T</math> components are the true input <math>x</math>. We can then use [[UsesMethod::SEARN|SEARN]] on this input. | ||
+ | |||
+ | In dealing with <math>\pi*</math>, the optimal (and thus initial) policy, it already predicts fully the true input, and it can arbitrarily/randomly predict the latent input. This gives us the ability to run the first iteration of [[UsesMethod::SEARN|SEARN]] |
Revision as of 21:31, 29 September 2011
In progress by Francis Keith
Contents
Citation
"Unsupervised Search-based Structured Prediction", Hal Daume III, ICML 2009
Online Version
An online version of the paper can be found here [1]
Summary
This paper details a methodology for unsupervised SEARN. It compares the results to other methods, first on synthetic data, and then on other methods for unsupervised dependency parsing.
Algorithm
The basic SEARN algorithm is described (see the SEARN wiki article for more background).
In the supervised form, the algorithm uses a sample space of pairs, where is the input and is the true output. In the unsupervised case, we must account for the fact that the algorithm must be run on an input of simply , with the classifier still producing .
The proposed solution is to essentially predict , and then perform the normal prediction. The loss function will only be dependent on .
Given an input , which is a sequence of length , we consider the true output to be of length , where the first components will be sampled from the possible vocabulary for , and represent the latent structure, and the last components are the true input . We can then use SEARN on this input.
In dealing with , the optimal (and thus initial) policy, it already predicts fully the true input, and it can arbitrarily/randomly predict the latent input. This gives us the ability to run the first iteration of SEARN