Difference between revisions of "User talk:Xxiong"

From Cohen Courses
Jump to navigationJump to search
(Blanked the page)
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Citation ==
 
  
Einat Minkov, Richard C. Wang & William W. Cohen, Extracting Personal Names from Emails:
 
Applying Named Entity Recognition to Informal Text, in HLT/EMNLP 2005
 
 
== Online version ==
 
 
[http://www.cs.cmu.edu/~einat/email.pdf Extracting Personal Names from Emails]
 
 
== Summary ==
 
Task: NER from emails
 
 
Techniques: treating NER as tagging. CRF model is used for this task.
 
 
Contribution:
 
* email-specific feature set
 
 
repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.
 
 
== Example SEARN Usage ==
 
 
'''Sequence Labeling'''
 
* Discussed SEARN's application to [[AddressesProblem::POS tagging]] and [[AddressesProblem::NP chunking]]
 
 
''Tagging''
 
* Task is to produce a label sequence from an input sequence.
 
* Search framed as left-to-right greedy search.
 
* ''Loss function'': Hamming loss
 
* Optimal Policy:
 
[[File:op-tagging.png]]
 
 
 
''NP Chunking''
 
* Chunking is a joint segmentation and labeling problem.
 
* ''Loss function'': F1 measure
 
* Optimal Policy:
 
[[File:op-chunking.png]]
 
 
'''Parsing'''
 
* Looked at dependency parsing with a shift-reduce framework.
 
* ''Loss funtion'': Hamming loss over dependencies.
 
* ''Decisions'': shift/reduce
 
* ''Optimal Policy'':
 
[[File:op-parsing.png]]
 
 
'''Machine Translation'''
 
* Framed task as a left-to-right translation problem.
 
* Search space over prefixes of translations.
 
* Actions are adding a word (or phrase to end of existing translation.
 
* ''Loss function'': 1 - BLEU or 1 - NIST
 
* ''Optimal policy'': given set of reference translations R, English translation prefix e_1, ... e_i-1, what word (or phrase) should be produced next / are we finished.
 
 
== Related papers ==
 
 
* '''Search-based Structured Prediction''': This is the journal version of the paper that introduces the [[UsesMethod::SEARN]] algorithm - [[RelatedPaper::Daume_et_al,_ML_2009]].
 

Latest revision as of 16:53, 8 October 2010