Difference between revisions of "Hyeju Jang et al IRI 2006"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Kerstin Denecke and Jochen Bernauer. 2007. Extracting Specific MEdical Data Using Semantic Structures. Artificial Intelligence in Medicine, LNCS Vol. 4594/2007, 2…')
 
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Citation ==
 
== Citation ==
  
Kerstin Denecke and Jochen Bernauer. 2007. Extracting Specific MEdical Data Using Semantic Structures. Artificial Intelligence in Medicine, LNCS Vol. 4594/2007, 257-264.
+
Hyeju Jang, Yun Jin, Sung Hyon Myaeng. 2006. Integration of Low Level Linguistic Information for Clinical Document Semantic Tagging. In Proceeding of IEEE Conference on Information Reuse and Integration, 292-297.
  
 
== Online version ==
 
== Online version ==
  
[http://www.springerlink.com/content/n858219p72rm3105/fulltext.pdf Springerlink]
+
[http://ieeexplore.ieee.org/iel5/4018442/4018443/04018505.pdf IEEE Xplore]
  
 
== Summary ==
 
== Summary ==
 +
The [[Category::Paper]] presents a semantic tagger which extract "symptom", "therapy", and "performance" from free-text clinical records. It can be said high-level [[AddressesProblem::Named Entity Recognition]] based on phrase in order to help answering two questions below.
  
The paper presents a medical information extraction system which extracts a variety of information from free text clinical records in German.
+
1. How can X be used in the treatment of Y?
  
This system can process other languages with some modification of its language dependent components.
+
2. What are the performance characteristics of X in the setting of Y?
  
Their approach is based on automatic generation of semantic structures for free text. The system automatically map text to semantic structures.
 
  
[[File:Denecke.jpg]]
+
The data set used here was 300 narrative sections of "progress after hospital stay" of Clinical Data Architecture (CDA) documents, which came from Seoul National University Hospital. The data is not public.
  
 +
Texts were written by Korean doctors. The characteristics of the texts are they have a lot of specialized medical words, abbreviations, and non-alphanumeric symbols. In addition, the texts are written in mixed Korean and English. Mostly, English are used for the medical terminologies, and Korean are for some general nouns and most verbs though there are some exceptions depending on the style of a doctor. Some examples are below.
  
The performance of the system on template filling is quite good achieving 81-95% precision and 83-97% recall. However, their evaluations are limited to three templates: hospitalization, state at discharge, and risk factors. Also, the performance of filling the template, hospitalization, is relatively much lower than the other two, but the authors does not give description about what is the unique characteristics of hospitalization though they show some error analysis.
+
* Specialized medical words
 +
**Ex) hypothyroidism , hypertensive
 +
* Abbreviations
 +
** Ex) ACA, MRI, CR
 +
* non-alphanumeric symbols
 +
** Ex)  , ↑
 +
* numeric data
 +
** Ex) 0.24/73.2 , 1-2회 , 12.1% , 02.12.16-03.1.2
 +
* Mixed Korean and English
 +
** English for the medical terminologies
 +
* Korean for some general nouns and most verbs
 +
** Ex) 시행한 basal hormone level 에서, clinical 하게는 cushinoid feature 처럼 보이나
 +
 
 +
The approach of the system is [[UsesMethod::Hidden Markov Model]] (HMM) using equivalence classes based on primitive tagging results such as UMLS and POS to solve serious data sparse problem.
 +
 
 +
[[File:seta.png]]
 +
 
 +
The evaluation was performed on 200 documents for training  and 100 documents for test with 3 fold validation. The performance of the system is not high, approximately 70%.
 +
The author thinks this could be improved with reflecting various aspects of language.

Latest revision as of 16:06, 21 October 2010

Citation

Hyeju Jang, Yun Jin, Sung Hyon Myaeng. 2006. Integration of Low Level Linguistic Information for Clinical Document Semantic Tagging. In Proceeding of IEEE Conference on Information Reuse and Integration, 292-297.

Online version

IEEE Xplore

Summary

The Paper presents a semantic tagger which extract "symptom", "therapy", and "performance" from free-text clinical records. It can be said high-level Named Entity Recognition based on phrase in order to help answering two questions below.

1. How can X be used in the treatment of Y?

2. What are the performance characteristics of X in the setting of Y?


The data set used here was 300 narrative sections of "progress after hospital stay" of Clinical Data Architecture (CDA) documents, which came from Seoul National University Hospital. The data is not public.

Texts were written by Korean doctors. The characteristics of the texts are they have a lot of specialized medical words, abbreviations, and non-alphanumeric symbols. In addition, the texts are written in mixed Korean and English. Mostly, English are used for the medical terminologies, and Korean are for some general nouns and most verbs though there are some exceptions depending on the style of a doctor. Some examples are below.

  • Specialized medical words
    • Ex) hypothyroidism , hypertensive
  • Abbreviations
    • Ex) ACA, MRI, CR
  • non-alphanumeric symbols
    • Ex)  , ↑
  • numeric data
    • Ex) 0.24/73.2 , 1-2회 , 12.1% , 02.12.16-03.1.2
  • Mixed Korean and English
    • English for the medical terminologies
  • Korean for some general nouns and most verbs
    • Ex) 시행한 basal hormone level 에서, clinical 하게는 cushinoid feature 처럼 보이나

The approach of the system is Hidden Markov Model (HMM) using equivalence classes based on primitive tagging results such as UMLS and POS to solve serious data sparse problem.

Seta.png

The evaluation was performed on 200 documents for training and 100 documents for test with 3 fold validation. The performance of the system is not high, approximately 70%. The author thinks this could be improved with reflecting various aspects of language.