Difference between revisions of "Zhou et al ACM symposium on Applied Computing 2006"
PastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 9: | Line 9: | ||
== Summary == | == Summary == | ||
− | The paper presents | + | The paper presents a MEDical Information Extraction (MedIE) system, which extracts patient information from free-text clinical records. |
+ | They divided their extraction job into three tasks below. | ||
* extraction of medical terms | * extraction of medical terms | ||
− | * text classification | + | * relation extraction |
− | + | ** extraction of associated medical concepts | |
+ | ** e.g. Blood pressure & 144/90 in the sentence, "Blood pressure is 144/90" | ||
+ | * text classification | ||
+ | ** e.g. a patient can be classified as a former smoker, a current smoker, or a non-smoker | ||
Their approaches are: | Their approaches are: | ||
* An ontology-based approach for extracting medical terms of interest | * An ontology-based approach for extracting medical terms of interest | ||
+ | ** they used Unified Medical Language System (UMLS) | ||
+ | ** About terms that are not defined in UMLS, they predicted categories of some terms using sentence structures. | ||
* A graph-based approach which uses the parsing result of link-grammar parser for relation-extraction | * A graph-based approach which uses the parsing result of link-grammar parser for relation-extraction | ||
− | ** | + | ** They included the processing of negation. |
− | ** When the parser fails, they used a pattern-based approach | + | ** When the parser fails, they used a pattern-based approach. |
− | ** | + | ** Because the parser did not process multi-word terms, they replaced the terms with placeholders. |
* an NLP-based feature extraction method coupled with an ID3-based decision tree for text classification | * an NLP-based feature extraction method coupled with an ID3-based decision tree for text classification | ||
− | This approach was fairly successful | + | This approach was fairly successful mostly showing over 80% of precision and recall. However, the system was tested on the data written by only a clinician, which means that the style of free-text records was consistent. |
− | |||
− | However, the system was tested on the data written by only a clinician, which means that the style of free-text records | ||
== Related papers == | == Related papers == |
Revision as of 21:12, 30 September 2010
Citation
Ciaohua Zhou et al. 2006. Approaches to Text Mining for Clinical Medical Records. In Proceedings of the 2006 ACM symposium on Applied computing, 235-239.
Online version
Summary
The paper presents a MEDical Information Extraction (MedIE) system, which extracts patient information from free-text clinical records.
They divided their extraction job into three tasks below.
- extraction of medical terms
- relation extraction
- extraction of associated medical concepts
- e.g. Blood pressure & 144/90 in the sentence, "Blood pressure is 144/90"
- text classification
- e.g. a patient can be classified as a former smoker, a current smoker, or a non-smoker
Their approaches are:
- An ontology-based approach for extracting medical terms of interest
- they used Unified Medical Language System (UMLS)
- About terms that are not defined in UMLS, they predicted categories of some terms using sentence structures.
- A graph-based approach which uses the parsing result of link-grammar parser for relation-extraction
- They included the processing of negation.
- When the parser fails, they used a pattern-based approach.
- Because the parser did not process multi-word terms, they replaced the terms with placeholders.
- an NLP-based feature extraction method coupled with an ID3-based decision tree for text classification
This approach was fairly successful mostly showing over 80% of precision and recall. However, the system was tested on the data written by only a clinician, which means that the style of free-text records was consistent.
Related papers
The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.
An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.