Paper: Zhang and Johnson, CoNLL 2003

From Cohen Courses
Revision as of 17:32, 30 October 2011 by Taruns (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

Robust Risk Minimization based Named Entity Recognition System, Zhang and Johnson, CoNLL 2003

Online Version

Here is the online version of the paper.

Summary

This paper describes a robust linear classification system for Named Entity Recognition. Earlier reports suggested accuracy (F1) of machine learning based systems to be in the lower 90s, but these studies were often performed on relatively restricted domains. Performance of a statistically based named entity extraction system can vary significantly depending on the underlying domain. The advantage of the proposed system in this paper is that it can easily incorporate a large number of linguistic features. The main focus of this paper is to investigate the impact of some local features and to show how the system performance can be enhanced significantly with some relatively simple token-based features. More sophisticated linguistic features, although helpful, yield much less improvement in system performance than might be expected.

This study provides useful insight into usefulness of various available local linguistic features. Since these simple features are readily available for many languages, it suggests the possibility of setting up a language independent Named Entity Recognition system quickly so that its performance is close to a system that uses much more sophisticated, language dependent features.

Brief description of the method

The authors treat the Named Entity Recognition problem as a sequential token-based tagging problem. The sequence of tokenized text is denoted by . The goal is to assign a class-label to every token . In this paper, IOB1 encoding scheme provided in the CoNLL-2003 shared task is used. The system estimates the conditional probability , where is the feature vector and can depend on previously predicted class labels , but the dependency is typically assumed to be local. The conditional probability model has the following parametric form:

where = min(1, max(0,y)) is the truncation of into the interval [0,1]. is a linear weight vector and is a constant. Given the training data for . It was shown in Zhang et al., 2002 that such a model can be estimated by solving the following optimization problem for each :

,

where when and otherwise. The function is defined as:

CoNLL2003 pic1.jpg

The author calls a classification method that is based on approximately minimizing this risk function robust risk minimization.

Experimental Result

The authors study the performance of the system with different feature combinations on the English development set. Table 1 shows features used, and Table 2 shows the results.

CoNLL2003 results.jpg

The following points are worth noting -

  • Experiment 1 and Experiment 2 implies that tokens by themselves, whether represented as mixed case text or not, do not significantly affect the system performance.
  • Experiment 3 shows that even without case information, the performance of a statistical named entity recognition system can be greatly enhanced with token prefix and suffix information.
  • Experiment 4 suggests that capitalization is a very useful feature for mixed case text.
  • Experiment 5 shows that with token prefix and suffix information that incorporates a character-based entity model, the system performance is further enhanced.
  • In Experiment 6 the POS and chunking information was added, and it lead to only a relatively small improvement
  • Experiment 7 shows that by adding four supplied dictionary a small but statistically significant improvement is observed.
  • Experiment 8 adds a number of additional dictionaries from different sources and it lead to a slight improvement in the system.

Most of the performance improvements can be achieved with some relatively simple token features that are easy to construct. Although, more sophisticated linguistic features are helpful, they provide much less improvement than might be expected.

Related papers

Similar kind of risk minimization method is implemented in Zhang et al., 2002.