Cohen and Hersh Briefings in Bioinformatics 2005

From Cohen Courses
Revision as of 22:49, 30 September 2010 by PastStudents (talk | contribs) (→‎Summary)
Jump to navigationJump to search

Citation

Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.

Online version

oxfordjournals

Summary

This is a survey paper about biomedical text mining in 2005.

They describe the state of the art in 2005 for each distinct type of text-mining task below.

  • Named entity recognition
    • Problems
      • No complete dictionary for most types of biological named entities
      • ambiguous words and phrases
      • multi names
    • approaches
      • lexicon based
      • rule based
        • AbGene system of Tanabe and Wilbur
        • GAPSCORE system
      • statistically based
    • performance
      • overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.
  • Text classification


  • Synonym and abbreviation extraction
  • Relationship extraction
  • Hypothesis generation
  • Integration frameworks

Related papers

The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.

An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.