Named Entity Recognition
From Cohen Courses
Revision as of 17:51, 30 November 2010 by PastStudents (talk | contribs)
Summary
Named Entity Recognition (or NER for short) is a problem in the field of information extraction that which looks at identifying atomic elements (entities) in text and classifying them into predefined classes such as person names, organizations, locations, dates, etc. Various named entity type hierarchies have been proposed in the literature, such as BBN's categories (used in Question Answering) and Sekine's Extended Named Entity Hierarchy
Common Approaches
Some common models for named entity recognition include the following:
- Lexicons
- Checks if a token is part of a predefined set
- Classifying pre-segmented candidates
- Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is
- Sliding Window
- Try all reasonable token windows (different lengths and positions), train a Naive Bayes classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold
- Token Tagging
- Classify tokens sequentially, with models like Hidden Markov Models or Conditional Random Fields.
Example Systems
- ...
References / Links
- BBN Named Entity Types - [1]
- Satoshi Sekine's Extended Named Entity Hierarchy - [2]
- Wikipedia page on Named entity recognition - [3]