Cohen and Hersh Briefings in Bioinformatics 2005

Citation

Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.

Online version

oxfordjournals

Summary

This is a survey paper about biomedical text mining in 2005.

Named entity recognition
- Problems
  - No complete dictionary for most types of biological named entities
  - ambiguous words and phrases
  - multi names
- approaches are mainly categorized into three below
  - lexicon based
  - rule based
  - statistically based
- performance
  - overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.

Text classification

Synonym and abbreviation extraction
- Synonym
  - use dictionary
  - automatic extraction of gene name synonyms from biomedical free text
  - SVM classifier-based
  - pattern-based
- abbreviation
  - either the full form or the abbreviation is often enclosed in parentheses.
  - a variety of alignment and scoring methods

Relationship extraction
- detect occurrences of a prespecified type of relationship between a pair of entities of given types
- manually generated template-based methods
- automatic template methods
- statistical methods
- NLP-based methods

mostly are about the relationships between genes and proteins

Hypothesis generation
- uncover relationships that are not present in the text but instead are inferred by the presence of other more explicit relationships. uncover previously unrecognized relationships

Integration frameworks
- integrated text-mining frameworks
- still in the research and development phrase.

The authors' suggestions
- Access to full text is required
- Additional analytical methods with possible features are required for a particular application
- Researchers should consider actual users' needs. The performance of a system with certain metrics does not guarantee users' satisfaction.
- Shared challenge tasks should be continued

Related papers

The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.

An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.

Cohen and Hersh Briefings in Bioinformatics 2005

Contents

Citation

Online version

Summary

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools