Difference between revisions of "Cohen and Hersh Briefings in Bioinformatics 2005"

Latest revision as of 00:23, 1 October 2010

Citation

Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.

Online version

oxfordjournals

Summary

This is a survey paper about biomedical text mining in 2005.

Named entity recognition
- Problems
  - No complete dictionary for most types of biological named entities
  - ambiguous words and phrases
  - multi names
- approaches are mainly categorized into three below
  - lexicon based
  - rule based
  - statistically based
- performance
  - overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.

Text classification

Synonym and abbreviation extraction
- Synonym
  - use dictionary
  - automatic extraction of gene name synonyms from biomedical free text
  - SVM classifier-based
  - pattern-based
- abbreviation
  - either the full form or the abbreviation is often enclosed in parentheses.
  - a variety of alignment and scoring methods

Relationship extraction
- detect occurrences of a prespecified type of relationship between a pair of entities of given types
- manually generated template-based methods
- automatic template methods
- statistical methods
- NLP-based methods

mostly are about the relationships between genes and proteins

Hypothesis generation
- uncover relationships that are not present in the text but instead are inferred by the presence of other more explicit relationships. uncover previously unrecognized relationships

Integration frameworks
- integrated text-mining frameworks
- still in the research and development phrase.

The authors' suggestions
- Access to full text is required
- Additional analytical methods with possible features are required for a particular application
- Researchers should consider actual users' needs. The performance of a system with certain metrics does not guarantee users' satisfaction.
- Shared challenge tasks should be continued

@@ Line 8: / Line 8: @@
 == Summary ==
+This is a survey paper about biomedical text mining in 2005.
-The paper presents a MEDical Information Extraction (MedIE) system, which extracts patient information from free-text clinical records.
+* Named entity recognition
+** Problems
+*** No complete dictionary for most types of biological named entities
+*** ambiguous words and phrases
+*** multi names
+** approaches are mainly categorized into three below
+*** lexicon based
+*** rule based
+*** statistically based
+** performance
+*** overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.
-They divided their extraction job into three tasks below.
+* Text classification
-* extraction of medical terms
-* relation extraction
-** extraction of associated medical concepts
-** e.g. Blood pressure & 144/90 in the sentence, "Blood pressure is 144/90"
-* text classification
-** e.g. a patient can be classified as a former smoker, a current smoker, or a non-smoker
-Their approaches are:
-* An ontology-based approach for extracting medical terms of interest
-** they used Unified Medical Language System (UMLS)
-** About terms that are not defined in UMLS, they predicted categories of some terms using sentence structures.
-* A graph-based approach which uses the parsing result of link-grammar parser for relation-extraction
-** They included the processing of negation.
-** When the parser fails, they used a pattern-based approach.
-** Because the parser did not process multi-word terms, they replaced the terms with placeholders.
-* an NLP-based feature extraction method coupled with an ID3-based decision tree for text classification
+* Synonym and abbreviation extraction
+** Synonym
+*** use dictionary
+*** automatic extraction of gene name synonyms from biomedical free text
+*** SVM classifier-based
+*** pattern-based
+** abbreviation
+*** either the full form or the abbreviation is often enclosed in parentheses.
+*** a variety of alignment and scoring methods
-This approach was fairly successful mostly showing over 80% of precision and recall. However, the system was tested on the data written by only a clinician, which means that the style of free-text records was consistent. Nevertheless, the research is worth in that they applied various IE techniques to the free-text clinical records, explain about the problems they encountered.
+* Relationship extraction
+** detect occurrences of a prespecified type of relationship between a pair of entities of given types
+** manually generated template-based methods
+** automatic template methods
+** statistical methods
+** NLP-based methods
+mostly are about the relationships between genes and proteins
-== Related papers ==
+* Hypothesis generation
+** uncover relationships that are not present in the text but instead are inferred by the presence of other more explicit relationships. uncover previously unrecognized relationships
-The widely cited [[RelatedPaper::Pang et al EMNLP 2002]] paper was influenced by this paper - but considers supervised learning techniques.  The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.
+* Integration frameworks
+** integrated text-mining frameworks
+** still in the research and development phrase.
-An interesting follow-up paper is [[RelatedPaper::Turney and Littman, TOIS 2003]] which focuses on evaluation of the technique of using PMI for predicting the [[semantic orientation of words]].
+* The authors' suggestions
+** Access to full text is required
+** Additional analytical methods with possible features are required for a particular application
+** Researchers should consider actual users' needs. The performance of a system with certain metrics does not guarantee users' satisfaction.
+** Shared challenge tasks should be continued

Difference between revisions of "Cohen and Hersh Briefings in Bioinformatics 2005"

Latest revision as of 00:23, 1 October 2010

Citation

Online version

Summary

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools