Difference between revisions of "User talk:Xxiong"

Revision as of 15:35, 8 October 2010

Citation

Einat Minkov, Richard C. Wang & William W. Cohen, Extracting Personal Names from Emails: Applying Named Entity Recognition to Informal Text, in HLT/EMNLP 2005

Online version

Extracting Personal Names from Emails

Summary

Task: NER from emails

Techniques: treating NER as tagging. CRF model is used for this task.

Contribution:

email-specific feature set

repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.

Example SEARN Usage

Sequence Labeling

Discussed SEARN's application to POS tagging and NP chunking

Tagging

Task is to produce a label sequence from an input sequence.
Search framed as left-to-right greedy search.
Loss function: Hamming loss
Optimal Policy:

NP Chunking

Chunking is a joint segmentation and labeling problem.
Loss function: F1 measure
Optimal Policy:

Parsing

Looked at dependency parsing with a shift-reduce framework.
Loss funtion: Hamming loss over dependencies.
Decisions: shift/reduce
Optimal Policy:

Machine Translation

Framed task as a left-to-right translation problem.
Search space over prefixes of translations.
Actions are adding a word (or phrase to end of existing translation.
Loss function: 1 - BLEU or 1 - NIST
Optimal policy: given set of reference translations R, English translation prefix e_1, ... e_i-1, what word (or phrase) should be produced next / are we finished.

Related papers

Search-based Structured Prediction: This is the journal version of the paper that introduces the SEARN algorithm - Daume_et_al,_ML_2009.

@@ Line 9: / Line 9: @@
 == Summary ==
+Task: NER from emails
-This unpublished [[Category::Paper|manuscript]] describes how [[UsesMethod::SEARN]] can be used for three Natural Language Processing related tasks: [[AddressesProblem::Sequence Labeling]], [[AddressesProblem::Parsing]], and [[AddressesProblem::Machine Translation]]
+Techniques: treating NER as tagging. CRF model is used for this task.
-The key points of the paper are:
+Contribution:
-* Authors state that [[UsesMethod::SEARN]] is efficient, widely applicable, theoretically justified, and simple.
+* email-specific feature set
-* [[UsesMethod::SEARN]] looks at problems a search problems, and learns classifiers that walk through the search space in a good way.
-* Authors looked at 3 sample problems: [[AddressesProblem::Sequence Labeling]], [[AddressesProblem::Parsing]], and [[AddressesProblem::Machine Translation]]
+repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.
-* Efficacy of [[UsesMethod::SEARN]] hinges on ability to compute an optimal/near-optimal policy. When an optimal policy is not available, authors suggest performing explicit search as an approximation. For segmentaiton and parsing, optimal policy is closed form; for summarization and machine translation, the optimal policy is not available.
 == Example SEARN Usage ==

Difference between revisions of "User talk:Xxiong"

Revision as of 15:35, 8 October 2010

Contents

Citation

Online version

Summary

Example SEARN Usage

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools