Difference between revisions of "Hyejuj project abstract"

From Cohen Courses
Jump to navigationJump to search
 
(22 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== What I plan to do ==
 
== What I plan to do ==
I propose a semantic tagger that provides high level concept information for phrases based on several kinds of low level information about words in clinical narrative texts.  
+
I propose a semantic tagger that provides high level concept information for phrases in clinical narrative texts. I am going to use clinical narrative documents written by Korean doctors. The high level concept information which will be annotated is below.
  
== Target Semantic Tag ==
+
'''Target Semantic Tag
 
* Symptom
 
* Symptom
 
* Diagnosis
 
* Diagnosis
Line 13: Line 13:
 
* Patient Result
 
* Patient Result
  
== Dataset ==
+
== Motivation ==
I have 600 manually tagged clinical narrative documents.
+
Clinical documents are invaluable information which can be used for medical research and future treatment plan. However, they are not utilized in hospital efficiently, and most of jobs are being performed manually because there are no tools to process such clinical texts automatically in Korea. Semantic tagging on clinical documents will be able to help developing applications which can be useful for doctors.
They have been tagged with Unifies Medical Language System (UMLS), Part-of-Speech (POS).
 
  
 
== Interesting point ==
 
== Interesting point ==
 
The clinical documents are written in both Korean and English. Usually, English is used for the medical terminologies, and Korean is used for some general nouns and most verbs though there are many exceptions.
 
The clinical documents are written in both Korean and English. Usually, English is used for the medical terminologies, and Korean is used for some general nouns and most verbs though there are many exceptions.
  
== Motivation ==
+
== Dataset ==
Clinical documents are invaluable information. Semantic tagging on clinical documents will help doctors with a support for medical decision making or for quality assurance of medical treatment.
+
I have 600 clinical narrative documents.
 
+
They have been tagged with Unifies Medical Language System (UMLS), Part-of-Speech (POS) automatically.  
== Background ==
+
They also have been tagged with the target semantic tags manually.
I have developed a semantic tagger using Hidden Markov Model (HMM) in 2006.
 
At that time, the target semantic tags were "Symptom", "Therapy", and "Performance."
 
  
 
== Evaluation ==
 
== Evaluation ==
Line 31: Line 28:
  
 
== Techniques that can be used to solve this problem ==
 
== Techniques that can be used to solve this problem ==
* use Conditional Random Field
+
* To use UMLS, POS, abbreviation, clue words, and numerical information to produce higher level concept information.
* use UMLS, POS, abbreviation, clue words, and numerical information to produce higher level concept information.  
+
* To use Conditional Random Field
  
 
== What question to answer ==
 
== What question to answer ==
 
Can we show good performance on high-level semantic tagging using CRF?
 
Can we show good performance on high-level semantic tagging using CRF?
  
== Probable Project Partner ==
+
== Team Member ==
Daegun Won
+
[[User:hyejuj|Hyeju Jang]] [hyejuj@cs.cmu.edu]
 +
 
 +
== Related Experience ==
 +
We have developed a semantic tagger using Hidden Markov Model (HMM) in 2006 [1].
 +
At that time, the target semantic tags were "Symptom", "Therapy", and "Performance."
 +
 
 +
Since the grammar of texts are not clear, we treated them as a bag of words, and preprocessed with UMLS tagging and POS tagging. UMLS tagging was performed with MetaMap and MetaMap Transfer (MMTx). Abbreviations in the corpus were also processed based on [2].
 +
 
 +
After preprocessing, we made equivalence classes to alleviate data sparse problem for HMM.
 +
UMLS tags were classified to cause, disease or symptom, and therapy for the targets of this system. Other UMLS tags were ignored since they were not related with the target semantic tags. Also, there were some clue words for target semantic tags and numeric terms included.
 +
 
 +
Then, one phrase was represented as combination of corresponding equivalence classes. These combinations were used for HMM instead of words themselves.
 +
 
 +
{| border="1" class="wikitable" style="margin: 1em auto 1em auto"
 +
|+ '''Equivalence Classes'''
 +
! Class || Members
 +
|-
 +
| UMLS tag for cause  || Biomedical or Dental Material, Food
 +
|-
 +
| UMLS tag for disease or symptom || Finding, Sign or Symptom, Disease or Syndrome, Neoplastic Process
 +
|-
 +
| UMLS tag for therapy || Diagnostic Procedure, Food, Medical Device, The rapeutic or Preventive Procedure
 +
|-
 +
| Clue word for therapy || 처방(prescription), 복용(administer medicine), 시행(operation), 후(after), 이후(later), 사용(use), 증량(increase), 수술(surgery), 중단 (discontinue)
 +
|-
 +
| Clue word for symptom || 발열(having fever), 관찰(observe)
 +
|-
 +
| Clue word for performance || 호전(improvement), 감소(decrease), 상승(rise), 정상(normal), 발생(occurrence), 변화(change)
 +
|-
 +
| Numeric for Date || Date of the event, time-order information
 +
|-
 +
| Numeric for prescription || The frequency of taking medication, does information
 +
|-
 +
| unknown || neither clue word nor UMLS tag
 +
|-
 +
|}
  
 
== References ==
 
== References ==
* [http://ir.kaist.ac.kr/papers/2006/Integration%20of%20Low%20Level%20Linguistic%20Information%20for%20Clinical%20Document%20Semantic%20Tagging.pdf Hyeju Jang, Yun Jin, Sung Hyon Myaeng, ''Integration of Low Level Linguistic Information for Clinical Document Semantic Tagging'', IEEE Conf. on Information Reuse and Integration 2006.]
+
* [1][http://ir.kaist.ac.kr/papers/2006/Integration%20of%20Low%20Level%20Linguistic%20Information%20for%20Clinical%20Document%20Semantic%20Tagging.pdf Hyeju Jang, Yun Jin, Sung Hyon Myaeng, ''Integration of Low Level Linguistic Information for Clinical Document Semantic Tagging'', IEEE Conf. on Information Reuse and Integration 2006.]
* TBA
+
* [2][http://ir.kaist.ac.kr/papers/2005/Abbreviation%20Disambiguation%20Using%20Semantic%20Abstraction%20of%20Symbols%20and%20Numeric.pdf  Sa Kwang Song, Yun Jin, and Sung Hyon Myaeng, ''Abbreviation Disambiguation Using Semantic Abstraction of Symbols and Numeric Terms'', IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 14-19, 2005.]

Latest revision as of 04:48, 9 October 2010

What I plan to do

I propose a semantic tagger that provides high level concept information for phrases in clinical narrative texts. I am going to use clinical narrative documents written by Korean doctors. The high level concept information which will be annotated is below.

Target Semantic Tag

  • Symptom
  • Diagnosis
  • Test
  • Test Result
  • Treatment Plan
  • Treatment
  • Treatment Stop
  • Performance
  • Patient Result

Motivation

Clinical documents are invaluable information which can be used for medical research and future treatment plan. However, they are not utilized in hospital efficiently, and most of jobs are being performed manually because there are no tools to process such clinical texts automatically in Korea. Semantic tagging on clinical documents will be able to help developing applications which can be useful for doctors.

Interesting point

The clinical documents are written in both Korean and English. Usually, English is used for the medical terminologies, and Korean is used for some general nouns and most verbs though there are many exceptions.

Dataset

I have 600 clinical narrative documents. They have been tagged with Unifies Medical Language System (UMLS), Part-of-Speech (POS) automatically. They also have been tagged with the target semantic tags manually.

Evaluation

The performance of the system can be measured as the level of accuracy of annotation, and it can be calculated as the number of correct tags per the total number of tags.

Techniques that can be used to solve this problem

  • To use UMLS, POS, abbreviation, clue words, and numerical information to produce higher level concept information.
  • To use Conditional Random Field

What question to answer

Can we show good performance on high-level semantic tagging using CRF?

Team Member

Hyeju Jang [hyejuj@cs.cmu.edu]

Related Experience

We have developed a semantic tagger using Hidden Markov Model (HMM) in 2006 [1]. At that time, the target semantic tags were "Symptom", "Therapy", and "Performance."

Since the grammar of texts are not clear, we treated them as a bag of words, and preprocessed with UMLS tagging and POS tagging. UMLS tagging was performed with MetaMap and MetaMap Transfer (MMTx). Abbreviations in the corpus were also processed based on [2].

After preprocessing, we made equivalence classes to alleviate data sparse problem for HMM. UMLS tags were classified to cause, disease or symptom, and therapy for the targets of this system. Other UMLS tags were ignored since they were not related with the target semantic tags. Also, there were some clue words for target semantic tags and numeric terms included.

Then, one phrase was represented as combination of corresponding equivalence classes. These combinations were used for HMM instead of words themselves.

Equivalence Classes
Class Members
UMLS tag for cause Biomedical or Dental Material, Food
UMLS tag for disease or symptom Finding, Sign or Symptom, Disease or Syndrome, Neoplastic Process
UMLS tag for therapy Diagnostic Procedure, Food, Medical Device, The rapeutic or Preventive Procedure
Clue word for therapy 처방(prescription), 복용(administer medicine), 시행(operation), 후(after), 이후(later), 사용(use), 증량(increase), 수술(surgery), 중단 (discontinue)
Clue word for symptom 발열(having fever), 관찰(observe)
Clue word for performance 호전(improvement), 감소(decrease), 상승(rise), 정상(normal), 발생(occurrence), 변화(change)
Numeric for Date Date of the event, time-order information
Numeric for prescription The frequency of taking medication, does information
unknown neither clue word nor UMLS tag

References