Difference between revisions of "Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction"
Line 8: | Line 8: | ||
Like SEARN, LaSO attempts to combine the learning of the model with the search that occurs during decoding. Instead of learning the model, and then doing a search during decoding to find the best output | Like SEARN, LaSO attempts to combine the learning of the model with the search that occurs during decoding. Instead of learning the model, and then doing a search during decoding to find the best output | ||
− | + | <math>\hat{y}=\text{arg}\max_{y \in \mathcal{Y}} f(x,y;w)</math>, LaSO attempts to learn to search. | |
− | <math>\hat{y}=\text{arg}\max_{y \in \mathcal{Y}} f(x,y;w)</math> | ||
− | |||
− | LaSO attempts to learn to search. | ||
=== Method === | === Method === | ||
Line 21: | Line 18: | ||
The ''enqueue'' function puts the nodes onto the queue in some order. Depending on the order that the ''enqueue'' function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks. The thing that is different about LaSO is the function ''enqueue''. It ranks the nodes according to ''g'' which is a linear function of features. The features can depend on the input ''x'' and the path to the current current node ''n'': | The ''enqueue'' function puts the nodes onto the queue in some order. Depending on the order that the ''enqueue'' function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks. The thing that is different about LaSO is the function ''enqueue''. It ranks the nodes according to ''g'' which is a linear function of features. The features can depend on the input ''x'' and the path to the current current node ''n'': | ||
− | <math>g=\mathbf{\omega}^\top</math> | + | <math>g=\mathbf{\omega}^\top \mathbf{\Phi}(x,n)</math> |
LaSO works by learning the ''enqueue'' function from the training examples. The learning algorithm is shown below: | LaSO works by learning the ''enqueue'' function from the training examples. The learning algorithm is shown below: | ||
Line 29: | Line 26: | ||
Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The ''siblings'' function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue. | Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The ''siblings'' function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue. | ||
− | + | If the search makes a mistake, the weights are updated with the function ''update''. The two functions they propose in the paper are the perceptron update (shown below) and a variant of the approximate large margin update (ALMA). | |
+ | |||
+ | [[File:LaSO Perceptron Update.png]] | ||
=== Experimental Result === | === Experimental Result === |
Revision as of 03:33, 1 October 2011
Citation and Online Link
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction An alternative formal analysis of Searn.
Summary
The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).
Like SEARN, LaSO attempts to combine the learning of the model with the search that occurs during decoding. Instead of learning the model, and then doing a search during decoding to find the best output , LaSO attempts to learn to search.
Method
The generic search (decoding) algorithm is shown below:
The enqueue function puts the nodes onto the queue in some order. Depending on the order that the enqueue function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks. The thing that is different about LaSO is the function enqueue. It ranks the nodes according to g which is a linear function of features. The features can depend on the input x and the path to the current current node n:
LaSO works by learning the enqueue function from the training examples. The learning algorithm is shown below:
Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The siblings function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue.
If the search makes a mistake, the weights are updated with the function update. The two functions they propose in the paper are the perceptron update (shown below) and a variant of the approximate large margin update (ALMA).
Experimental Result
Related Papers
In progress by User:Jmflanig