Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Citation and Link
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Summary
The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).
LaSO attempts to combine the learning of the model with the search that occurs during decoding. Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search.
Method
The generic search (decoding) algorithm is shown below:
The enqueue function puts the nodes onto the queue in some order. Depending on the order that the enqueue function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks. The thing that is different about LaSO is the function enqueue. It ranks the nodes according to g which is a linear function of features. The features can depend on the input x and the path to the current current node n:
LaSO works by learning the enqueue function from the training examples. The learning algorithm is shown below:
Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The siblings function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue.
If the search makes a mistake, the weights are updated with the function update. The two functions they propose in the paper are the perceptron update (shown below) and a variant of the approximate large margin update (ALMA).
Experimental Result
Related Papers
In progress by User:Jmflanig