Difference between revisions of "Pestian et al BioNLP 2007"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Pestian et al. 2007. A Shared Task Involving Multi-label Classification of Clinical Free Text. In Proceedings of the BioNLP 2007, 97-104. == Online version == […')
 
 
Line 6: Line 6:
  
 
[http://delivery.acm.org/10.1145/1580000/1572411/p97-pestian.pdf?key1=1572411&key2=4491095821&coll=GUIDE&dl=GUIDE&CFID=103951925&CFTOKEN=14314827 ACM portal]
 
[http://delivery.acm.org/10.1145/1580000/1572411/p97-pestian.pdf?key1=1572411&key2=4491095821&coll=GUIDE&dl=GUIDE&CFID=103951925&CFTOKEN=14314827 ACM portal]
 
== Summary ==
 
 
The paper presents a MEDical Information Extraction (MedIE) system, which extracts patient information from free-text clinical records.
 
 
They divided their extraction job into three tasks below.
 
* extraction of medical terms
 
* relation extraction
 
** extraction of associated medical concepts
 
** e.g. Blood pressure & 144/90 in the sentence, "Blood pressure is 144/90"
 
* text classification
 
** e.g. a patient can be classified as a former smoker, a current smoker, or a non-smoker
 
 
Their approaches are:
 
* An ontology-based approach for extracting medical terms of interest
 
** they used Unified Medical Language System (UMLS)
 
** About terms that are not defined in UMLS, they predicted categories of some terms using sentence structures.
 
* A graph-based approach which uses the parsing result of link-grammar parser for relation-extraction
 
** They included the processing of negation.
 
** When the parser fails, they used a pattern-based approach.
 
** Because the parser did not process multi-word terms, they replaced the terms with placeholders.
 
* an NLP-based feature extraction method coupled with an ID3-based decision tree for text classification
 
 
 
This approach was fairly successful mostly showing over 80% of precision and recall. However, the system was tested on the data written by only a clinician, which means that the style of free-text records was consistent. Nevertheless, the research is worth in that they applied various IE techniques to the free-text clinical records, explain about the problems they encountered.
 
  
 
== Related papers ==
 
== Related papers ==

Latest revision as of 23:10, 30 September 2010

Citation

Pestian et al. 2007. A Shared Task Involving Multi-label Classification of Clinical Free Text. In Proceedings of the BioNLP 2007, 97-104.

Online version

ACM portal

Related papers

The widely cited Pang et al EMNLP 2002 paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.

An interesting follow-up paper is Turney and Littman, TOIS 2003 which focuses on evaluation of the technique of using PMI for predicting the semantic orientation of words.