Difference between revisions of "Bikel et al MLJ 1999"
Line 18: | Line 18: | ||
[[File:BikelHmmGraph.png]] | [[File:BikelHmmGraph.png]] | ||
+ | |||
+ | Each of the regions in the above graph was modeled with a different statistical bigram language model (likelihood of words occurring within that region), meaning that each type of name is considered a different language, with separate bigram probabilities. Formally, one is trying to find the most likely sequence of name classes <math>NC</math> given a sequence of words <math>NC</math>: | ||
+ | |||
+ | <math> | ||
+ | \max P(NC|W) = \max \frac {P(W,NC)}{P(W)} = \max P(W,NC) | ||
+ | </math> | ||
+ | |||
+ | |||
+ | Additionally, the authors represented words as two-element vectors. <math>\left \langle w,f \right \rangle</math> represents a word occurrence where <math>w</math> is the text of the word and <math>f</math> is a feature that is assigned to it. The set of features as long as the motivation behind them can be found in the figure below. | ||
+ | |||
+ | |||
+ | [[File:BikelWordFeatures.png]] | ||
== Results == | == Results == | ||
* 100k words of training = 90% performance | * 100k words of training = 90% performance |
Revision as of 21:29, 27 September 2011
Being edited by Rui Correia
Citation
D. M. Bikel, R. L. Schwartz, and R. M. Weischedel. An algorithm that learns what's in a name. Machine Learning Journal, 34: 211-231, 1999.
Summary
In this paper the authors present IdentiFinder, an Hidden Markov Model approach to the Named Entity Recognition problem. Most techniques used in Named Entity Recognition until the time of the paper, were mainly based on handcrafted patterns that are completely language dependent, and not flexible to different inputs (speech input, upper case texts, etc).
This was the first paper that addressed Named Entity Recognition with HMM's, recognizing a structure in the identification of named entities, formulating it as a classification problem where a word is either part of some class or not.
Brief Description of the Method
Their solution had a model for each name-class and a model for the not-a-name text. Additionally, there are tow special states, the START-OF-SENTENCE and END-OF-SENTENCE. The figure below provides a graphical representation of the model (the dashed edges assure the completion of the graph).
Each of the regions in the above graph was modeled with a different statistical bigram language model (likelihood of words occurring within that region), meaning that each type of name is considered a different language, with separate bigram probabilities. Formally, one is trying to find the most likely sequence of name classes given a sequence of words :
Additionally, the authors represented words as two-element vectors. represents a word occurrence where is the text of the word and is a feature that is assigned to it. The set of features as long as the motivation behind them can be found in the figure below.
Results
- 100k words of training = 90% performance