Difference between revisions of "Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction"
Line 7: | Line 7: | ||
The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter). | The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter). | ||
− | LaSO attempts to combine | + | LaSO attempts to combine learning the model with the search that occurs during decoding. Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search. |
=== Method === | === Method === |
Revision as of 04:44, 1 October 2011
Citation and Link
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Summary
The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).
LaSO attempts to combine learning the model with the search that occurs during decoding. Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search.
Method
The generic search (decoding) algorithm is shown below:
The enqueue function puts the nodes onto the queue in some order. Depending on the order that the enqueue function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks.
In LaSO enqueue ranks nodes according to a function g which is a linear in the set of features. The features can depend on the input x and the path to the current current node n:
LaSO learns the feature weights from the training examples. The learning algorithm is shown below:
Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The siblings function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue.
If the search makes a mistake, the weights are updated with the function update. The two functions they propose in the paper are the perceptron update and a variant of the approximate large margin update (ALMA).
- Perceptron update rule:
- ALMA update rule:
where
and k starts at 1 and is incremented at each update.
Experimental Result
They performed experiments on two tasks: syntactic chunking and joint tagging and chunking. They used the CoNLL 2000 dataset. The results are shown below. LASOP denotes LaSOP trained with the perceptron, and LASOA denotes LaSOP trained with ALMA. The subscript number denotes the beam size.
Related Papers
In progress by User:Jmflanig