Difference between revisions of "Daume and Marcu 2005 Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction"

From Cohen Courses
Jump to navigationJump to search
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== Citation and Online Link ===
+
This is an article about a [[Category::paper]].
  
[http://hal3.name/docs/daume05laso.pdf Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction] An alternative formal analysis of Searn.
+
== Citation and Link ==
 +
Hal Daumé III and Daniel Marcu. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In ''Proceedings of the 22nd international conference on Machine learning'' (ICML '05). ACM, New York, NY, USA, 169-176.
 +
[http://hal3.name/docs/daume05laso.pdf Pdf version]
  
=== Summary ===
+
== Summary ==
  
The authors present the Learning as Search Optimization (LaSO) framework.  The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).
+
The authors present the Learning as Search Optimization (LaSO) framework for structured prediction.  The algorithm is basically [[UsesMethod::SEARN]] but analyzed differently (and also ~24 pages shorter).
  
Like SEARN, LaSO attempts to combine the learning of the model with the search that occurs during decoding.  Instead of learning the model, and then doing a search during decoding to find the best output <math>\hat{y}</math>
+
LaSO attempts to combine learning the model with the search that occurs during decoding for structured prediction.  Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search.
  
<math>\hat{y}=\text{arg}\max_{y \in \mathcal{Y}} f(x,y;w)</math>
+
== Method ==
 
 
=== Method ===
 
  
 
The generic search (decoding) algorithm is shown below:
 
The generic search (decoding) algorithm is shown below:
Line 17: Line 17:
 
[[File:LaSO Generic Search.png]]
 
[[File:LaSO Generic Search.png]]
  
The ''enqueue'' function puts the nodes onto the queue in some order.  Depending on the order that the ''enqueue'' function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks.  The thing that is different about LaSO is the function ''enqueue''.
+
The ''enqueue'' function puts the nodes onto the queue in some order.  Depending on the order that the ''enqueue'' function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks.
 +
 
 +
In LaSO ''enqueue'' ranks nodes according to a function ''g'' which is a linear in the set of features.  The features can depend on the input ''x'' and the path to the current current node ''n'':
 +
 
 +
<math>g=\mathbf{\omega}^\top \mathbf{\Phi}(x,n)</math>
  
LaSO works by learning the ''enqueue'' function from the training examples.
+
LaSO learns the feature weights from the training examples. The learning algorithm is shown below:
  
 
[[File:LaSO Algorithm.png]]
 
[[File:LaSO Algorithm.png]]
  
=== Experimental Result ===
+
Nodes for which there exists a path to the optimal goal are called "y-good" nodes.  The ''siblings'' function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good).  The algorithm may have to backtrack to find them if they are not currently in the queue.
 +
 
 +
If the search makes a mistake, the weights are updated with the function ''update''.  The two update methods they propose in the paper are the [[UsesMethod::Voted Perceptron]] and a variant of the approximate large margin update (ALMA).
 +
 
 +
* '''Perceptron update rule:'''
 +
 
 +
<math>\mathbf{\omega} \leftarrow \mathbf{\omega} + \Delta</math>
 +
 
 +
[[File:LaSO Perceptron Update.png]]
 +
 
 +
* '''ALMA update rule:'''
 +
 
 +
<math>\mathbf{\omega} \leftarrow \mathbf{\omega} + \wp(\mathbf{\omega} + Ck^{-1/2} \wp(\Delta))</math>
 +
 
 +
where
 +
 
 +
<math>\wp(\mathbf{u}) = \mathbf{u} / \max \{1,\lVert \mathbf{u} \rVert_{2} \}</math>
 +
 
 +
and k starts at 1 and is incremented at each update.
 +
 
 +
== Experimental Result ==
 +
 
 +
They performed experiments on two tasks: syntactic chunking and joint tagging and chunking.  They used the CoNLL 2000 dataset.  The results are shown below.  LASOP denotes LaSO trained with the perceptron, and LASOA denotes LaSO trained with ALMA.  The subscript number denotes the beam size.
 +
 
 +
[[File:LaSO Table 1.png]]
 +
 
 +
[[File:LaSO Table 2.png]]
  
=== Related Papers ===
+
== Related Papers and Pages ==
  
In progress by [[User:Jmflanig]]
+
LaSO attempts to do the same thing as [[RelatedPaper::Daume et al, ML 2009]], but accomplishes it in a slightly different framework.

Latest revision as of 02:08, 11 October 2011

This is an article about a paper.

Citation and Link

Hal Daumé III and Daniel Marcu. 2005. Learning as search optimization: approximate large margin methods for structured prediction. In Proceedings of the 22nd international conference on Machine learning (ICML '05). ACM, New York, NY, USA, 169-176. Pdf version

Summary

The authors present the Learning as Search Optimization (LaSO) framework for structured prediction. The algorithm is basically SEARN but analyzed differently (and also ~24 pages shorter).

LaSO attempts to combine learning the model with the search that occurs during decoding for structured prediction. Instead of learning the model and then doing a search during decoding, LaSO attempts to directly learn to search.

Method

The generic search (decoding) algorithm is shown below:

LaSO Generic Search.png

The enqueue function puts the nodes onto the queue in some order. Depending on the order that the enqueue function puts nodes on the queue, you can get depth-first, breadth-first, beam, heuristic, A*, etc search algorithms from standard AI textbooks.

In LaSO enqueue ranks nodes according to a function g which is a linear in the set of features. The features can depend on the input x and the path to the current current node n:

LaSO learns the feature weights from the training examples. The learning algorithm is shown below:

LaSO Algorithm.png

Nodes for which there exists a path to the optimal goal are called "y-good" nodes. The siblings function denotes the set of nodes at the same depth as current node that can reach the goal (i.e. are y-good). The algorithm may have to backtrack to find them if they are not currently in the queue.

If the search makes a mistake, the weights are updated with the function update. The two update methods they propose in the paper are the Voted Perceptron and a variant of the approximate large margin update (ALMA).

  • Perceptron update rule:

LaSO Perceptron Update.png

  • ALMA update rule:

where

and k starts at 1 and is incremented at each update.

Experimental Result

They performed experiments on two tasks: syntactic chunking and joint tagging and chunking. They used the CoNLL 2000 dataset. The results are shown below. LASOP denotes LaSO trained with the perceptron, and LASOA denotes LaSO trained with ALMA. The subscript number denotes the beam size.

LaSO Table 1.png

LaSO Table 2.png

Related Papers and Pages

LaSO attempts to do the same thing as Daume et al, ML 2009, but accomplishes it in a slightly different framework.