S. Patwardhan and E. Riloff. EMNLP 2009
Citation
S. Patwardhan and E. Riloff. A unified model of phrasal and sentential evidence for information extraction. in EMNLP 2009
Online version
Summary
Previous IE systems make decision only based on immediate context around a phrase. The authors argue that for more complex tasks, such as event extraction, a larger field of view is often needed to understand how facts tie together. This paper proposed a new model for event extraction. To determine whether a noun phrase should be extracted as a filler for an event role the new model computes the joint probability that NP:
- appears in an event sentence, and
- is a legitimate filler for the event role.
To compute the probability of a sentence describing a relevant event, they use SVM, which is not a probabilistic classifier. The authors used the margin as an indicator of confidence. It worked well for them. Named entities, lexico-syntactic pattern features, sentence length, bag of words, and verb tense are used as features.
To determine whether a noun phrase can be a legitimate filler for a specific type of event role based on its local context, the authors used Naive Bayes classifier. The features include lexical matches, semantic features, and syntactic relations.
To evaluate this new model, the author tested on two datasets, the MUC-4 terrorism corpus and ProMed disease outbreaks corpus, and compared it with 3 baseline systems (two are pattern-based and another is NB classifier without sentence information).
In MUC-4, the unified IE model outperforms 3 baselines and other systems except PerpOrg role. They argued the reason of this inferior performance is that for the event that far from the main event description, sentential event recognizer tends to generate low probabilities for such sentences.
In ProMed, the unified IE model shows comparable results on Victim role and improved precision on Disease role.