Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction

From Cohen Courses

Revision as of 05:07, 1 October 2011 by Jmflanig (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Contents

1 Citation and Link
2 Summary
3 Method
4 Experimental Result
5 Related Papers

Citation and Link

Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction

Summary

The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).

LaSO attempts to combine the learning of the model with the search that occurs during decoding. Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search.

Method

The generic search (decoding) algorithm is shown below:

The enqueue function puts the nodes onto the queue in some order. Depending on the order that the enqueue function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks.

In LaSO enqueue ranks nodes according to a function g which is a linear in the set of features. The features can depend on the input x and the path to the current current node n:

$g=\mathbf {\omega } ^{\top }\mathbf {\Phi } (x,n)$

LaSO learns the feature weights from the training examples. The learning algorithm is shown below:

Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The siblings function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue.

If the search makes a mistake, the weights are updated with the function update. The two functions they propose in the paper are the perceptron update and a variant of the approximate large margin update (ALMA).

Perceptron update rule:

$\mathbf {\omega } \leftarrow \mathbf {\omega } +\Delta$

ALMA update rule:

$\mathbf {\omega } \leftarrow \mathbf {\omega } +\wp (\mathbf {\omega } +Ck^{-1/2}\wp (\Delta ))$

where

$\wp (\mathbf {u} )=\mathbf {u} /\max\{1,\lVert \mathbf {u} \rVert _{2}\}$

and k starts at 1 and is incremented at each update.

Experimental Result

Related Papers

In progress by User:Jmflanig

Retrieved from "http://curtis.ml.cmu.edu/w/courses/index.php?title=Daume_and_Marcu_2005_Learning_as_Search_Optimization:_Approximate_Large_Margin_Methods_for_Structured_Prediction&oldid=8664"

Navigation menu