Difference between revisions of "Cohen and Hersh Briefings in Bioinformatics 2005"

Revision as of 00:04, 1 October 2010

Citation

Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.

Online version

oxfordjournals

Summary

This is a survey paper about biomedical text mining in 2005.

They describe the state of the art in 2005 for each distinct type of text-mining task below.

Named entity recognition
- Problems
  - No complete dictionary for most types of biological named entities
  - ambiguous words and phrases
  - multi names
- approaches are mainly categorized into three below
  - lexicon based
  - rule based
  - statistically based
- performance
  - overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.

Text classification

Synonym and abbreviation extraction
- Synonym
  - use dictionary
  - automatic extraction of gene name synonyms from biomedical free text
  - SVM classifier-based
  - pattern-based
- abbreviation
  - either the full form or the abbreviation is often enclosed in parentheses.
  - a variety of alignment and scoring methods

Relationship extraction
- detect occurrences of a prespecified type of relationship between a pair of entities of given types
- manually generated template-based methods
- automatic template methods
- statistical methods
- NLP-based methods

mostly are about the relationships between genes and proteins

Hypothesis generation
- uncover relationships that are not present in the text but instead are inferred by the presence of other more explicit relationships. uncover previously unrecognized relationships

Integration frameworks
- integrated text-mining frameworks

Related papers

The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.

An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.

@@ Line 28: / Line 28: @@
 * Synonym and abbreviation extraction
+** Synonym
+*** use dictionary
+*** automatic extraction of gene name synonyms from biomedical free text
+*** SVM classifier-based
+*** pattern-based
+** abbreviation
+*** either the full form or the abbreviation is often enclosed in parentheses.
+*** a variety of alignment and scoring methods
 * Relationship extraction
+** detect occurrences of a prespecified type of relationship between a pair of entities of given types
+** manually generated template-based methods
+** automatic template methods
+** statistical methods
+** NLP-based methods
+mostly are about the relationships between genes and proteins
 * Hypothesis generation
+** uncover relationships that are not present in the text but instead are inferred by the presence of other more explicit relationships. uncover previously unrecognized relationships
 * Integration frameworks
+** integrated text-mining frameworks
 == Related papers ==

Difference between revisions of "Cohen and Hersh Briefings in Bioinformatics 2005"

Revision as of 00:04, 1 October 2010

Contents

Citation

Online version

Summary

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools