Difference between revisions of "Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction"
(→Method) |
|||
Line 27: | Line 27: | ||
Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The ''siblings'' function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue. | Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The ''siblings'' function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue. | ||
− | If the search makes a mistake, the weights are updated with the function ''update''. The two functions they propose in the paper are the perceptron update | + | If the search makes a mistake, the weights are updated with the function ''update''. The two functions they propose in the paper are the perceptron update and a variant of the approximate large margin update (ALMA). |
+ | |||
+ | * '''Perceptron update rule:''' | ||
+ | |||
+ | <math>\mathbf{\omega} \leftarrow \mathbf{\omega} + \Delta</math> | ||
[[File:LaSO Perceptron Update.png]] | [[File:LaSO Perceptron Update.png]] | ||
+ | |||
+ | * '''ALMA update rule:''' | ||
+ | |||
+ | <math>\mathbf{\omega} \leftarrow \mathbf{\omega} + \wp(\mathbf{\omega} + Ck^{-1/2} \wp(\Delta))</math> | ||
+ | |||
+ | where | ||
+ | |||
+ | <math>\wp(\mathbf{u}) = \mathbf{u} / \max \{1,\lVert \mathbf{u} \rVert_{2} \}</math> | ||
+ | |||
+ | and k starts at 1 and is incremented at each update. | ||
=== Experimental Result === | === Experimental Result === | ||
+ | |||
+ | |||
=== Related Papers === | === Related Papers === | ||
In progress by [[User:Jmflanig]] | In progress by [[User:Jmflanig]] |
Revision as of 04:07, 1 October 2011
Citation and Link
Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Summary
The authors present the Learning as Search Optimization (LaSO) framework. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).
LaSO attempts to combine the learning of the model with the search that occurs during decoding. Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search.
Method
The generic search (decoding) algorithm is shown below:
The enqueue function puts the nodes onto the queue in some order. Depending on the order that the enqueue function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks.
In LaSO enqueue ranks nodes according to a function g which is a linear in the set of features. The features can depend on the input x and the path to the current current node n:
LaSO learns the feature weights from the training examples. The learning algorithm is shown below:
Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The siblings function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue.
If the search makes a mistake, the weights are updated with the function update. The two functions they propose in the paper are the perceptron update and a variant of the approximate large margin update (ALMA).
- Perceptron update rule:
- ALMA update rule:
where
and k starts at 1 and is incremented at each update.
Experimental Result
Related Papers
In progress by User:Jmflanig