User talk:Xxiong

From Cohen Courses
Jump to navigationJump to search

Citation

Einat Minkov, Richard C. Wang & William W. Cohen, Extracting Personal Names from Emails: Applying Named Entity Recognition to Informal Text, in HLT/EMNLP 2005

Online version

Extracting Personal Names from Emails

Summary

Task: extract person names from emails

Techniques: treating NER as tagging. CRF model is used for this task.

Contribution:

  • email-specific feature set.
  • The authors found that repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.

based on this discovery.

Recall-enhancing Techniques:

  • single document repetition (SDR): mark repeated tokens within a single document as a name.
  • multiple document repetition (MDR): mark repeated tokens appearing in multiple documents as a name.

Related papers

  • Search-based Structured Prediction: This is the journal version of the paper that introduces the SEARN algorithm - Daume_et_al,_ML_2009.