Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Citation and Link
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Summary
The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).
LaSO attempts to combine the learning of the model with the search that occurs during decoding. Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search.
Method
The generic search (decoding) algorithm is shown below:
The enqueue function puts the nodes onto the queue in some order. Depending on the order that the enqueue function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks.
In LaSO enqueue ranks nodes according to a function g which is a linear in the set of features. The features can depend on the input x and the path to the current current node n:
LaSO learns the feature weights from the training examples. The learning algorithm is shown below:
Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The siblings function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue.
If the search makes a mistake, the weights are updated with the function update. The two functions they propose in the paper are the perceptron update and a variant of the approximate large margin update (ALMA).
- Perceptron update rule:
- ALMA update rule:
where
and k starts at 1 and is incremented at each update.
Experimental Result
Related Papers
In progress by User:Jmflanig