Difference between revisions of "User talk:Xxiong"

From Cohen Courses
Jump to navigationJump to search
Line 17: Line 17:
  
 
repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.
 
repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.
 
== Example SEARN Usage ==
 
 
'''Sequence Labeling'''
 
* Discussed SEARN's application to [[AddressesProblem::POS tagging]] and [[AddressesProblem::NP chunking]]
 
 
''Tagging''
 
* Task is to produce a label sequence from an input sequence.
 
* Search framed as left-to-right greedy search.
 
* ''Loss function'': Hamming loss
 
* Optimal Policy:
 
[[File:op-tagging.png]]
 
 
 
''NP Chunking''
 
* Chunking is a joint segmentation and labeling problem.
 
* ''Loss function'': F1 measure
 
* Optimal Policy:
 
[[File:op-chunking.png]]
 
 
'''Parsing'''
 
* Looked at dependency parsing with a shift-reduce framework.
 
* ''Loss funtion'': Hamming loss over dependencies.
 
* ''Decisions'': shift/reduce
 
* ''Optimal Policy'':
 
[[File:op-parsing.png]]
 
 
'''Machine Translation'''
 
* Framed task as a left-to-right translation problem.
 
* Search space over prefixes of translations.
 
* Actions are adding a word (or phrase to end of existing translation.
 
* ''Loss function'': 1 - BLEU or 1 - NIST
 
* ''Optimal policy'': given set of reference translations R, English translation prefix e_1, ... e_i-1, what word (or phrase) should be produced next / are we finished.
 
  
 
== Related papers ==
 
== Related papers ==
  
 
* '''Search-based Structured Prediction''': This is the journal version of the paper that introduces the [[UsesMethod::SEARN]] algorithm - [[RelatedPaper::Daume_et_al,_ML_2009]].
 
* '''Search-based Structured Prediction''': This is the journal version of the paper that introduces the [[UsesMethod::SEARN]] algorithm - [[RelatedPaper::Daume_et_al,_ML_2009]].

Revision as of 15:36, 8 October 2010

Citation

Einat Minkov, Richard C. Wang & William W. Cohen, Extracting Personal Names from Emails: Applying Named Entity Recognition to Informal Text, in HLT/EMNLP 2005

Online version

Extracting Personal Names from Emails

Summary

Task: NER from emails

Techniques: treating NER as tagging. CRF model is used for this task.

Contribution:

  • email-specific feature set

repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.

Related papers

  • Search-based Structured Prediction: This is the journal version of the paper that introduces the SEARN algorithm - Daume_et_al,_ML_2009.