Class Meeting for 10-707 9/16/2009
From Cohen Courses
Jump to navigationJump to search
This is one of the class meetings on the schedule for the course Information Extraction 10-707 in Fall 2009.
NER as classification
Required Readings
- Information extraction from voicemail transcripts, by M. Jansche, S. P Abney. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, 2002.. For background, here's the Huang et al 2001 paper they compare to.
- Exploiting diverse knowledge sources via maximum entropy in named entity recognition, by A. Borthwick, J. Sterling, E. Agichtein, R. Grishman. In Proceedings of the sixth workshop on very large corpora, 1998.
Optional Readings
- Use of Support Vector Machines in Extended Named Entity Recognition, Takeuchi and Collier, CoNLL 2002. A biotext extraction task using a Borthwick-like approach, but with SVMs as the learner.
- Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron, Collins, ACL 2002. Reranking candidates using global features - similar in some ways to the Janche/Abney paper.
- Unsupervised Models for Named Entity Classification, Collins and Singer, EMNLP 1999. A very nice paper on using a co-training like approach to classifying named entity candidates.
- Boosted Wrapper Induction, Freitag and Kushmerick, AAAI 2000. A boosting approach to learning rules that identify NE boundaries.
- Understanding Captions in Biomedical Publications, Cohen et al., KDD 2002. Another candidate-classification task from a biotext domain.