Difference between revisions of "Cohen and Hersh Briefings in Bioinformatics 2005"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) (Created page with '== Citation == Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71. == Online version…') |
PastStudents (talk | contribs) |
||
Line 8: | Line 8: | ||
== Summary == | == Summary == | ||
+ | This is a survey paper about biomedical text mining in 2005. | ||
− | + | They describe the state of the art in 2005 for each distinct type of text-mining task below. | |
− | + | * Named entity recognition | |
− | * | + | ** Problems |
− | * | + | *** No complete dictionary for most types of biological named entities |
− | ** | + | *** ambiguous words and phrases |
− | ** | + | *** multi names |
− | * | + | ** approaches |
− | ** | + | *** lexicon based |
+ | *** rule based | ||
+ | **** AbGene system of Tanabe and Wilbur | ||
+ | **** GAPSCORE system | ||
+ | *** statistically based | ||
+ | ** performance | ||
+ | *** overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent. | ||
− | + | * Text classification | |
− | * | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | * Synonym and abbreviation extraction | |
+ | |||
+ | * Relationship extraction | ||
+ | |||
+ | * Hypothesis generation | ||
+ | |||
+ | * Integration frameworks | ||
== Related papers == | == Related papers == |
Revision as of 23:49, 30 September 2010
Citation
Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.
Online version
Summary
This is a survey paper about biomedical text mining in 2005.
They describe the state of the art in 2005 for each distinct type of text-mining task below.
- Named entity recognition
- Problems
- No complete dictionary for most types of biological named entities
- ambiguous words and phrases
- multi names
- approaches
- lexicon based
- rule based
- AbGene system of Tanabe and Wilbur
- GAPSCORE system
- statistically based
- performance
- overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.
- Problems
- Text classification
- Synonym and abbreviation extraction
- Relationship extraction
- Hypothesis generation
- Integration frameworks
Related papers
The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.
An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.