Difference between revisions of "User talk:Xxiong"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
== Team Members ==
+
== Citation ==
  
Xuehan Xiong [xxiong@andrew.cmu.edu]
+
Duame, H., Langford, J., and Marcu, D. 2006. SEARN in Practice. Unpublished Manuscript
  
== Goal ==
+
== Online version ==
1. A revisit of boosting.
 
  
2. Extend a stacked hierarchical model recently developed for computer vision tasks and apply it
+
[http://hal3.name/docs/daume06searn-practice.pdf SEARN in Practice]
in the IE domain.
 
  
== Motivation ==
+
== Summary ==
1.
 
In the traditional boosting, within each iteration the mis-classified samples are weighted more
 
in the next round. However, these errors are made from training data. In my algorithm,
 
I will give more weight to the data that are mis-labeled from cross-validation process,
 
as in stacking.
 
  
2. The intuition of stacked hierarchical model is that
+
This unpublished [[Category::Paper|manuscript]] describes how [[UsesMethod::SEARN]] can be used for three Natural Language Processing related tasks: [[AddressesProblem::Sequence Labeling]], [[AddressesProblem::Parsing]], and [[AddressesProblem::Machine Translation]]
predictions from one level of the hierarchy should help to predict the entities in the level above or below.
 
Besides using neighbors' predictions, parent or/and children predictions may also be "stacked" into one's feature vector.
 
Different from LDA, this model can only be used in a supervised mode.
 
  
== Dataset ==
+
The key points of the paper are:
 +
* Authors state that [[UsesMethod::SEARN]] is efficient, widely applicable, theoretically justified, and simple.
 +
* [[UsesMethod::SEARN]] looks at problems a search problems, and learns classifiers that walk through the search space in a good way.
 +
* Authors looked at 3 sample problems: [[AddressesProblem::Sequence Labeling]], [[AddressesProblem::Parsing]], and [[AddressesProblem::Machine Translation]]
 +
* Efficacy of [[UsesMethod::SEARN]] hinges on ability to compute an optimal/near-optimal policy. When an optimal policy is not available, authors suggest performing explicit search as an approximation. For segmentaiton and parsing, optimal policy is closed form; for summarization and machine translation, the optimal policy is not available.
  
== Superpowers ==
+
== Example SEARN Usage ==
  
Experience with CRF and stacking in the domain of computer vision.
+
'''Sequence Labeling'''
 +
* Discussed SEARN's application to [[AddressesProblem::POS tagging]] and [[AddressesProblem::NP chunking]]
  
== What question you want to answer ==
+
''Tagging''
1. I want to know whether the proposed algorithm will outperform
+
* Task is to produce a label sequence from an input sequence.
the traditional Ada-boost.
+
* Search framed as left-to-right greedy search.
 +
* ''Loss function'': Hamming loss
 +
* Optimal Policy:
 +
[[File:op-tagging.png]]
  
2. I want to know whether the stacked hierarchical model will be more effective than
+
 
hierarchical Bayesian models, such as LDA, in the applications of IE and whether it will improve the
+
''NP Chunking''
results upon the original stacking algorithm without hierarchy.
+
* Chunking is a joint segmentation and labeling problem.
 +
* ''Loss function'': F1 measure
 +
* Optimal Policy:
 +
[[File:op-chunking.png]]
 +
 
 +
'''Parsing'''
 +
* Looked at dependency parsing with a shift-reduce framework.
 +
* ''Loss funtion'': Hamming loss over dependencies.
 +
* ''Decisions'': shift/reduce
 +
* ''Optimal Policy'':
 +
[[File:op-parsing.png]]
 +
 
 +
'''Machine Translation'''
 +
* Framed task as a left-to-right translation problem.
 +
* Search space over prefixes of translations.
 +
* Actions are adding a word (or phrase to end of existing translation.
 +
* ''Loss function'': 1 - BLEU or 1 - NIST
 +
* ''Optimal policy'': given set of reference translations R, English translation prefix e_1, ... e_i-1, what word (or phrase) should be produced next / are we finished.
 +
 
 +
== Related papers ==
 +
 
 +
* '''Search-based Structured Prediction''': This is the journal version of the paper that introduces the [[UsesMethod::SEARN]] algorithm - [[RelatedPaper::Daume_et_al,_ML_2009]].

Revision as of 13:43, 8 October 2010

Citation

Duame, H., Langford, J., and Marcu, D. 2006. SEARN in Practice. Unpublished Manuscript

Online version

SEARN in Practice

Summary

This unpublished manuscript describes how SEARN can be used for three Natural Language Processing related tasks: Sequence Labeling, Parsing, and Machine Translation

The key points of the paper are:

  • Authors state that SEARN is efficient, widely applicable, theoretically justified, and simple.
  • SEARN looks at problems a search problems, and learns classifiers that walk through the search space in a good way.
  • Authors looked at 3 sample problems: Sequence Labeling, Parsing, and Machine Translation
  • Efficacy of SEARN hinges on ability to compute an optimal/near-optimal policy. When an optimal policy is not available, authors suggest performing explicit search as an approximation. For segmentaiton and parsing, optimal policy is closed form; for summarization and machine translation, the optimal policy is not available.

Example SEARN Usage

Sequence Labeling

Tagging

  • Task is to produce a label sequence from an input sequence.
  • Search framed as left-to-right greedy search.
  • Loss function: Hamming loss
  • Optimal Policy:

Op-tagging.png


NP Chunking

  • Chunking is a joint segmentation and labeling problem.
  • Loss function: F1 measure
  • Optimal Policy:

Op-chunking.png

Parsing

  • Looked at dependency parsing with a shift-reduce framework.
  • Loss funtion: Hamming loss over dependencies.
  • Decisions: shift/reduce
  • Optimal Policy:

Op-parsing.png

Machine Translation

  • Framed task as a left-to-right translation problem.
  • Search space over prefixes of translations.
  • Actions are adding a word (or phrase to end of existing translation.
  • Loss function: 1 - BLEU or 1 - NIST
  • Optimal policy: given set of reference translations R, English translation prefix e_1, ... e_i-1, what word (or phrase) should be produced next / are we finished.

Related papers

  • Search-based Structured Prediction: This is the journal version of the paper that introduces the SEARN algorithm - Daume_et_al,_ML_2009.