|
|
(7 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
− | == Citation ==
| |
| | | |
− | Einat Minkov, Richard C. Wang & William W. Cohen, Extracting Personal Names from Emails:
| |
− | Applying Named Entity Recognition to Informal Text, in HLT/EMNLP 2005
| |
− |
| |
− | == Online version ==
| |
− |
| |
− | [http://www.cs.cmu.edu/~einat/email.pdf Extracting Personal Names from Emails]
| |
− |
| |
− | == Summary ==
| |
− | Task: NER from emails
| |
− |
| |
− | Techniques: treating NER as tagging. CRF model is used for this task.
| |
− |
| |
− | Contribution:
| |
− | * email-specific feature set
| |
− |
| |
− | repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.
| |
− |
| |
− | == Example SEARN Usage ==
| |
− |
| |
− | '''Sequence Labeling'''
| |
− | * Discussed SEARN's application to [[AddressesProblem::POS tagging]] and [[AddressesProblem::NP chunking]]
| |
− |
| |
− | ''Tagging''
| |
− | * Task is to produce a label sequence from an input sequence.
| |
− | * Search framed as left-to-right greedy search.
| |
− | * ''Loss function'': Hamming loss
| |
− | * Optimal Policy:
| |
− | [[File:op-tagging.png]]
| |
− |
| |
− |
| |
− | ''NP Chunking''
| |
− | * Chunking is a joint segmentation and labeling problem.
| |
− | * ''Loss function'': F1 measure
| |
− | * Optimal Policy:
| |
− | [[File:op-chunking.png]]
| |
− |
| |
− | '''Parsing'''
| |
− | * Looked at dependency parsing with a shift-reduce framework.
| |
− | * ''Loss funtion'': Hamming loss over dependencies.
| |
− | * ''Decisions'': shift/reduce
| |
− | * ''Optimal Policy'':
| |
− | [[File:op-parsing.png]]
| |
− |
| |
− | '''Machine Translation'''
| |
− | * Framed task as a left-to-right translation problem.
| |
− | * Search space over prefixes of translations.
| |
− | * Actions are adding a word (or phrase to end of existing translation.
| |
− | * ''Loss function'': 1 - BLEU or 1 - NIST
| |
− | * ''Optimal policy'': given set of reference translations R, English translation prefix e_1, ... e_i-1, what word (or phrase) should be produced next / are we finished.
| |
− |
| |
− | == Related papers ==
| |
− |
| |
− | * '''Search-based Structured Prediction''': This is the journal version of the paper that introduces the [[UsesMethod::SEARN]] algorithm - [[RelatedPaper::Daume_et_al,_ML_2009]].
| |