Liuy project abstract
Contents
Active Learning based Named Entity Recognition (AL-NER)
In this project, I plan to work on Active Learning based Named Entity Recognition (AL-NER). I focus on non-sequential token tagging approaches.
- Problem Definition
I formulate NER as the following active learning task : Given a document d, and S be the set of classes of names to rec- ognize, NER is to actively learn a classifier as a mapping from name to a predefined name class s, for all names in the document d, where s is in S.
- Dataset
I plan to evaluate AL-NER on a standard corpora of news articles -- MUC6.
My plan to do with the data
I use the convenient tool FEX and SNoW to approach NE-tagging : learn labels from phrases. FEX is used to represent examples in SNoW input format and SNoW is supervised learning system. I need to write scripts to preprocess raw documents into input to FEX and SNoW and to convert SNoW output to labels that are in desired format.
Why I think it’s interesting
- NER is an important sub-process in information extraction.
As a phrase-level task, NER identifies occurrences of entity names in documents, and classify them in defined categories : names of persons, organizations, dates.
- Statistical learning-based recognizers are used to detect new text genres and new name categories.
The training of statistical NER (e.g. Max-Ent and SVM) requires tagging occurrences of entity names in a large corpus.
- Active learning allows learners to have control over what
data to be labeled, and thus is used to reduce annotation effort, while maintaining accuracy.
- It is interesting to study the behavior of various active learning algorithms
for NER task, to understand what active learning method or selection criterion can best reduce the sample cost without sacrifice accuracy of NER.
- Some related work are [1][2][3]
Superpowers I have
- My research concentration is active learning : algorithmic work and theoretical perspectives
- Use FEX and SNoW for Named Entity Tagging
Evaluation of my work
- I measure NER performance by precision, recall, and F-score, calculated based on the six types of
matches used in MUC, between a predicted named entity and the answer key : correct, incorrect, partially correct, missing, spurious and noncommittal.
- The basic learners are : naive bayes, perceptron, winnow, svm,
- I compare the number of examples needed in order to achieve certain NER accuracy by active selection,
with that by random selection.
What techniques I plan to use
For active learning method, I exploit the following algorithms :
- Uncertainty sampling [4], where the usefulness of example is measured by uncertainty of single learner or entropy of output distribution.
- Query by committee [5], where the usefulness of example is measured by disagreement of committee of learners. For example, vote entropy (disagreement between winners).
- SVM based methods [6], where the usefulness of example is measured by their proximity to hyperplane.
- IWAL(important weighted active learning) [7] with theoretical guarantee.
To decide which examples to pick, we will explore different criteria. Take SVM-based active learning as example, the closer the NE to the separator, the more uncertain the example is. We use Dynamic time warping as similarity measure between NEs, and take the largest density NEs as the representatives (centroids based on clustering).
What question I want to answer
- Among all the candidate base learners applied to NER -- perceptron, naive bayes, and winnow, who is the winner in terms of classification accuracy and time efficiency, for passive learning ?
- For active learning, what selection criterion (uncertainty, representativeness, and diversity) results in the best NER performance, and how about the combined use of them?
- What active learning methods can effectively reduce the sample needed to achieve desirable NER accuracy ?
Who I might work with
I propose this project on my own. Welcome collaborators on this project.
Reference
- [1] F. Olsson. Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora. Doctoral thesis, University of Gothenburg. 2008
- [2] R. Reichart, K. Tomanek, U. Hahn and A. Rappoport. Multi-Task Active Learning for Linguistic Annotations. ACL 2008
- [3] K. Tomanek, F. Laws, U. Hahn, and H. Sch\"{u}tze. On proper unit selection in active learning: co-selection effects for named entity recognition. HLT 2009
- [4] D. D. Lewis and J. Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. ICML 1994.
- [5] H. S. Seung, M. Opper and H. Sompolinsky, Query by committee, Proc. 5th Workshop on Computational Learning Theory, 1992.
- [6] G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. ICML 2000.
- [7] A. Beygelzimer, S. Dasgupta, and J. Langford. Importance-weighted active learning. ICML 2009.