Cohen and Hersh Briefings in Bioinformatics 2005

Citation

Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.

Online version

oxfordjournals

Summary

This is a survey paper about biomedical text mining in 2005.

They describe the state of the art in 2005 for each distinct type of text-mining task below.

Named entity recognition
- Problems
  - No complete dictionary for most types of biological named entities
  - ambiguous words and phrases
  - multi names
- approaches
  - lexicon based
  - rule based
    - AbGene system of Tanabe and Wilbur
    - GAPSCORE system
  - statistically based
- performance
  - overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.

Text classification

Synonym and abbreviation extraction

Relationship extraction

Hypothesis generation

Integration frameworks

Related papers

The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.

An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.

Cohen and Hersh Briefings in Bioinformatics 2005

Contents

Citation

Online version

Summary

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools