Morante and Daelemans CoNLL 2009
Citation
Morante, R. and Daelemans, W. A metalearning approach to processing the scope of negation. Proceedings of the Thirteenth Conference on Computational Natural Language Learning (2009).
Online Version
[citeseer]
Summary
This paper attempts to tackle a novel problem of determining the precise scope of a negation term in biomedical text. Previous papers had shown that for a given medical concept (like a patient's disease), its negation status could be assigned with high accuracy, but none had examined, for a given negation term, determining its precise scope. Negation (along with other context cues such as uncertainty) are crucial to proper IE of medical records, and increasingly for other applicatons such as sentiment detection.
The approach proceeded in two phases: negative term identification and then scope identification. Negative tokens were tagged as either beginning, inside, or outside a negative signal using an information gain decision tree with local features. Some words are unambiguously negative ("no", "lack", "absent" etc) and simply automatically assigned as negative.
Scope was decided by classifying tokens as either being the first element of the scope, last, or neither by 3 classifiers: a kNN, SVM, and CRF. A CRF meta-classifier takes these results and more features to assign final scope tags.
Determining scope accurately turns out to be a fairly difficult task, with a PCS measure (whole scope is correct or not) ranging from 0.40 to 0.70 depending on the data set. F1 for token by token were better, from 0.70 to 0.85. This does not compare too well to the regex-based, medical concept negation classifiers mentioned above (F1 ~0.95), but that is somewhat apples to oranges.
Related Papers
Regex and rules based negation classifiers frequently used in the clinical domain include Chapman et al J Biomed Inform 2001, Mutalik et al J Am Med Inform Assoc 2001, and Elkin et al BMC Medical Informatics and Decision Making 2005.
Councill et al Workshop on Negation and Speculation in NLP 2010 followed up with a simpler CRF model that performs well also on sentiment analysis.
The dataset consisted of clinical reports, biomedical texts, and biomedical abstracts annotated for various scopes by Vincze et al BMC Bioinformatics 2008.