Difference between revisions of "User talk:Xxiong"

From Cohen Courses
Jump to navigationJump to search
(Blanked the page)
 
(2 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Citation ==
 
  
Einat Minkov, Richard C. Wang & William W. Cohen, Extracting Personal Names from Emails:
 
Applying Named Entity Recognition to Informal Text, in HLT/EMNLP 2005
 
 
== Online version ==
 
 
[http://www.cs.cmu.edu/~einat/email.pdf Extracting Personal Names from Emails]
 
 
== Summary ==
 
Task: extract person names from emails
 
 
Techniques: treating NER as tagging. CRF model is used for this task.
 
 
Contribution:
 
* email-specific feature set.
 
* The authors found that repetitions within single document are more often in newwires while repetitions occurred in multiple files are more often in emails.
 
Based on this discovery, the authors introduced a new recall-enhancing method which is appropriate for emails.
 
 
Recall-enhancing Techniques:
 
* single document repetition (SDR): mark repeated tokens within a single document as a name.
 
* multiple document repetition (MDR): mark repeated tokens appearing in multiple documents as a name.
 
* inferred dictionaries: Build a dictionary from preliminary names from an extractor learned from training data. Then, perform filtering process based on predicted frequency (PF) and inverse document frequency (IDF). Words with low PF.IDF scores are either highly ambiguous in the corpus or the common words, which inaccurately predicted as names by the extractor.
 
* PF: measures the ratio between the number of times that a word predicted as part of a name and the number of occurrences of this word.
 
* IDF: measures word frequency.
 
 
== Related papers ==
 
 
* '''Search-based Structured Prediction''': This is the journal version of the paper that introduces the [[UsesMethod::SEARN]] algorithm - [[RelatedPaper::Daume_et_al,_ML_2009]].
 

Latest revision as of 16:53, 8 October 2010