Difference between revisions of "Cohen and Hersh Briefings in Bioinformatics 2005"
From Cohen Courses
Jump to navigationJump to searchPastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 9: | Line 9: | ||
== Summary == | == Summary == | ||
This is a survey paper about biomedical text mining in 2005. | This is a survey paper about biomedical text mining in 2005. | ||
− | |||
− | |||
* Named entity recognition | * Named entity recognition | ||
Line 17: | Line 15: | ||
*** ambiguous words and phrases | *** ambiguous words and phrases | ||
*** multi names | *** multi names | ||
− | ** approaches | + | ** approaches are mainly categorized into three below |
*** lexicon based | *** lexicon based | ||
*** rule based | *** rule based | ||
− | |||
− | |||
*** statistically based | *** statistically based | ||
** performance | ** performance | ||
Line 30: | Line 26: | ||
* Synonym and abbreviation extraction | * Synonym and abbreviation extraction | ||
+ | ** Synonym | ||
+ | *** use dictionary | ||
+ | *** automatic extraction of gene name synonyms from biomedical free text | ||
+ | *** SVM classifier-based | ||
+ | *** pattern-based | ||
+ | ** abbreviation | ||
+ | *** either the full form or the abbreviation is often enclosed in parentheses. | ||
+ | *** a variety of alignment and scoring methods | ||
* Relationship extraction | * Relationship extraction | ||
+ | ** detect occurrences of a prespecified type of relationship between a pair of entities of given types | ||
+ | ** manually generated template-based methods | ||
+ | ** automatic template methods | ||
+ | ** statistical methods | ||
+ | ** NLP-based methods | ||
+ | mostly are about the relationships between genes and proteins | ||
* Hypothesis generation | * Hypothesis generation | ||
+ | ** uncover relationships that are not present in the text but instead are inferred by the presence of other more explicit relationships. uncover previously unrecognized relationships | ||
* Integration frameworks | * Integration frameworks | ||
+ | ** integrated text-mining frameworks | ||
+ | ** still in the research and development phrase. | ||
− | + | * The authors' suggestions | |
− | + | ** Access to full text is required | |
− | + | ** Additional analytical methods with possible features are required for a particular application | |
− | + | ** Researchers should consider actual users' needs. The performance of a system with certain metrics does not guarantee users' satisfaction. | |
− | + | ** Shared challenge tasks should be continued |
Latest revision as of 00:23, 1 October 2010
Citation
Aaron M. Cohen and William R. Hersh. 2005. A Survey of Current Work in Biomedical Text Mining. Briefings in Bioinformatics. Vol 6. No 1. 57-71.
Online version
Summary
This is a survey paper about biomedical text mining in 2005.
- Named entity recognition
- Problems
- No complete dictionary for most types of biological named entities
- ambiguous words and phrases
- multi names
- approaches are mainly categorized into three below
- lexicon based
- rule based
- statistically based
- performance
- overall, the performance of gene and protein NER systems is F-scores between 75 and 85 percent.
- Problems
- Text classification
- Synonym and abbreviation extraction
- Synonym
- use dictionary
- automatic extraction of gene name synonyms from biomedical free text
- SVM classifier-based
- pattern-based
- abbreviation
- either the full form or the abbreviation is often enclosed in parentheses.
- a variety of alignment and scoring methods
- Synonym
- Relationship extraction
- detect occurrences of a prespecified type of relationship between a pair of entities of given types
- manually generated template-based methods
- automatic template methods
- statistical methods
- NLP-based methods
mostly are about the relationships between genes and proteins
- Hypothesis generation
- uncover relationships that are not present in the text but instead are inferred by the presence of other more explicit relationships. uncover previously unrecognized relationships
- Integration frameworks
- integrated text-mining frameworks
- still in the research and development phrase.
- The authors' suggestions
- Access to full text is required
- Additional analytical methods with possible features are required for a particular application
- Researchers should consider actual users' needs. The performance of a system with certain metrics does not guarantee users' satisfaction.
- Shared challenge tasks should be continued