Difference between revisions of "Named Entity Recognition"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 5: | Line 5: | ||
== Common Approaches == | == Common Approaches == | ||
− | Some common | + | Some common models for named entity recognition include the following: |
− | * | + | * Lexicons |
+ | ** Checks if a token is part of a predefined set | ||
+ | * Classifying pre-segmented candidates | ||
+ | ** Manually select candidates | ||
+ | ** Use YFCL on a piece of text to deterimine what type of entity it is | ||
+ | * Sliding Window | ||
+ | ** Try all reasonable token windows (different lengths and positions) | ||
+ | ** Train a [[UsesMethod::Naive Bayes]] classifier or YFCL | ||
+ | ** Extract text if Pr(class=+|prefix, contents, suffix) > some threshold | ||
+ | * Boundary Models | ||
+ | * Token Tagging | ||
+ | ** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]] or [[Uses:Method::Conditional Random Fields]]. | ||
== Example Systems == | == Example Systems == | ||
Line 12: | Line 23: | ||
== References / Links == | == References / Links == | ||
− | * Wikipedia page on Named entity recognition [http://en.wikipedia.org/wiki/Named_entity_recognition] | + | * Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition] |
== Relevant Papers == | == Relevant Papers == |
Revision as of 18:45, 30 November 2010
Summary
Named Entity Recognition (or NER for short) is a problem in the field of information extraction that which looks at identifying atomic elements (entities) in text and classifying them into predefined classes such as person names, organizations, locations, dates, etc.
Common Approaches
Some common models for named entity recognition include the following:
- Lexicons
- Checks if a token is part of a predefined set
- Classifying pre-segmented candidates
- Manually select candidates
- Use YFCL on a piece of text to deterimine what type of entity it is
- Sliding Window
- Try all reasonable token windows (different lengths and positions)
- Train a Naive Bayes classifier or YFCL
- Extract text if Pr(class=+|prefix, contents, suffix) > some threshold
- Boundary Models
- Token Tagging
- Classify tokens sequentially, with models like Hidden Markov Models or Conditional Random Fields.
Example Systems
- ...
References / Links
- Wikipedia page on Named entity recognition - [1]