Difference between revisions of "Named Entity Recognition"

From Cohen Courses
Jump to navigationJump to search
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
== Summary ==
 
== Summary ==
  
Named Entity Recognition (or NER for short) is a [[category::problem]] in the field of information extraction that which looks at identifying atomic elements (entities) in text and classifying them into predefined classes such as person names, organizations, locations, dates, etc. Various named entity type hierarchies have been proposed in the literature, such as [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html BNN's categories] (used in Question Answering) and [http://nlp.cs.nyu.edu/ene/ Sekine's Extended Named Entity Hierarchy]
+
Named Entity Recognition (or NER for short) is a [[category::problem]] in the field of information extraction that which looks at identifying atomic elements (entities) in text and classifying them into predefined classes such as person names, organizations, locations, dates, etc. Various named entity type hierarchies have been proposed in the literature, such as [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html BBN's categories] (used in Question Answering) and [http://nlp.cs.nyu.edu/ene/ Sekine's Extended Named Entity Hierarchy]
  
 
== Common Approaches ==
 
== Common Approaches ==
  
 
Some common models for named entity recognition include the following:
 
Some common models for named entity recognition include the following:
* Lexicons
+
* '''Lexicons'''
 
** Checks if a token is part of a predefined set
 
** Checks if a token is part of a predefined set
* Classifying pre-segmented candidates
+
* '''Classifying pre-segmented candidates'''
** Manually select candidates
+
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is
** Use YFCL on a piece of text to deterimine what type of entity it is
+
* '''Sliding Window'''
* Sliding Window
+
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold
** Try all reasonable token windows (different lengths and positions)
+
* '''Token Tagging / Sequential'''
** Train a [[UsesMethod::Naive Bayes]] classifier or YFCL
+
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[UsesMethod::Conditional Random Fields]].
** Extract text if Pr(class=+|prefix, contents, suffix) > some threshold
 
* Boundary Models
 
* Token Tagging  
 
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]] or [[Uses:Method::Conditional Random Fields]].  
 
  
 
== Example Systems ==
 
== Example Systems ==
* ...
+
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]
 +
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]
  
 
== References / Links ==
 
== References / Links ==
 +
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]
 +
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]
 
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]
 
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]
  

Latest revision as of 18:18, 1 February 2011

Summary

Named Entity Recognition (or NER for short) is a problem in the field of information extraction that which looks at identifying atomic elements (entities) in text and classifying them into predefined classes such as person names, organizations, locations, dates, etc. Various named entity type hierarchies have been proposed in the literature, such as BBN's categories (used in Question Answering) and Sekine's Extended Named Entity Hierarchy

Common Approaches

Some common models for named entity recognition include the following:

  • Lexicons
    • Checks if a token is part of a predefined set
  • Classifying pre-segmented candidates
    • Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is
  • Sliding Window
    • Try all reasonable token windows (different lengths and positions), train a Naive Bayes classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold
  • Token Tagging / Sequential

Example Systems

References / Links

  • BBN Named Entity Types - [1]
  • Satoshi Sekine's Extended Named Entity Hierarchy - [2]
  • Wikipedia page on Named entity recognition - [3]

Relevant Papers