Difference between revisions of "Cohen and Hersh Briefings in Bioinformatics 2005"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71. == Online version…')
 
Line 8: Line 8:
  
 
== Summary ==
 
== Summary ==
 +
This is a survey paper about biomedical text mining in 2005.
  
The paper presents a MEDical Information Extraction (MedIE) system, which extracts patient information from free-text clinical records.  
+
They describe the state of the art in 2005 for each distinct type of text-mining task below.
  
They divided their extraction job into three tasks below.
+
* Named entity recognition
* extraction of medical terms
+
** Problems
* relation extraction
+
*** No complete dictionary for most types of biological named entities
** extraction of associated medical concepts
+
*** ambiguous words and phrases
** e.g. Blood pressure & 144/90 in the sentence, "Blood pressure is 144/90"
+
*** multi names
* text classification
+
** approaches
** e.g. a patient can be classified as a former smoker, a current smoker, or a non-smoker
+
*** lexicon based
 +
*** rule based
 +
**** AbGene system of Tanabe and Wilbur
 +
**** GAPSCORE system
 +
*** statistically based
 +
** performance
 +
*** overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.
  
Their approaches are:
+
* Text classification
* An ontology-based approach for extracting medical terms of interest
 
** they used Unified Medical Language System (UMLS)
 
** About terms that are not defined in UMLS, they predicted categories of some terms using sentence structures.
 
* A graph-based approach which uses the parsing result of link-grammar parser for relation-extraction
 
** They included the processing of negation.
 
** When the parser fails, they used a pattern-based approach.
 
** Because the parser did not process multi-word terms, they replaced the terms with placeholders.
 
* an NLP-based feature extraction method coupled with an ID3-based decision tree for text classification
 
  
  
This approach was fairly successful mostly showing over 80% of precision and recall. However, the system was tested on the data written by only a clinician, which means that the style of free-text records was consistent. Nevertheless, the research is worth in that they applied various IE techniques to the free-text clinical records, explain about the problems they encountered.
+
* Synonym and abbreviation extraction
 +
 
 +
* Relationship extraction
 +
 
 +
* Hypothesis generation
 +
 
 +
* Integration frameworks
  
 
== Related papers ==
 
== Related papers ==

Revision as of 23:49, 30 September 2010

Citation

Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.

Online version

oxfordjournals

Summary

This is a survey paper about biomedical text mining in 2005.

They describe the state of the art in 2005 for each distinct type of text-mining task below.

  • Named entity recognition
    • Problems
      • No complete dictionary for most types of biological named entities
      • ambiguous words and phrases
      • multi names
    • approaches
      • lexicon based
      • rule based
        • AbGene system of Tanabe and Wilbur
        • GAPSCORE system
      • statistically based
    • performance
      • overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.
  • Text classification


  • Synonym and abbreviation extraction
  • Relationship extraction
  • Hypothesis generation
  • Integration frameworks

Related papers

The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.

An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.