Difference between revisions of "User talk:Xxiong"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 9: | Line 9: | ||
== Summary == | == Summary == | ||
+ | Task: NER from emails | ||
− | + | Techniques: treating NER as tagging. CRF model is used for this task. | |
− | + | Contribution: | |
− | * | + | * email-specific feature set |
− | + | ||
− | + | repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails. | |
− | |||
== Example SEARN Usage == | == Example SEARN Usage == |
Revision as of 14:35, 8 October 2010
Citation
Einat Minkov, Richard C. Wang & William W. Cohen, Extracting Personal Names from Emails: Applying Named Entity Recognition to Informal Text, in HLT/EMNLP 2005
Online version
Extracting Personal Names from Emails
Summary
Task: NER from emails
Techniques: treating NER as tagging. CRF model is used for this task.
Contribution:
- email-specific feature set
repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.
Example SEARN Usage
Sequence Labeling
- Discussed SEARN's application to POS tagging and NP chunking
Tagging
- Task is to produce a label sequence from an input sequence.
- Search framed as left-to-right greedy search.
- Loss function: Hamming loss
- Optimal Policy:
NP Chunking
- Chunking is a joint segmentation and labeling problem.
- Loss function: F1 measure
- Optimal Policy:
Parsing
- Looked at dependency parsing with a shift-reduce framework.
- Loss funtion: Hamming loss over dependencies.
- Decisions: shift/reduce
- Optimal Policy:
Machine Translation
- Framed task as a left-to-right translation problem.
- Search space over prefixes of translations.
- Actions are adding a word (or phrase to end of existing translation.
- Loss function: 1 - BLEU or 1 - NIST
- Optimal policy: given set of reference translations R, English translation prefix e_1, ... e_i-1, what word (or phrase) should be produced next / are we finished.
Related papers
- Search-based Structured Prediction: This is the journal version of the paper that introduces the SEARN algorithm - Daume_et_al,_ML_2009.