Difference between revisions of "S. Patwardhan and E. Riloff. EMNLP 2009"

From Cohen Courses
Jump to navigationJump to search
Line 24: Line 24:
 
Named entities, lexico-syntactic pattern features,  
 
Named entities, lexico-syntactic pattern features,  
 
sentence length, bag of words, and verb tense are used as features.
 
sentence length, bag of words, and verb tense are used as features.
Except , the unified IE model outperforms 3 baselines and other systems.
+
 
  
 
To determine
 
To determine
Line 36: Line 36:
 
3 baseline systems (two are pattern-based and another is NB classifier without sentence information).
 
3 baseline systems (two are pattern-based and another is NB classifier without sentence information).
  
For the event that often discussed later in a document, far
+
In [[UsesDataset::MUC-4]], the unified IE model outperforms 3 baselines and other systems except PerpOrg role.
removed from the main event description,  
+
They argued the reason of this inferior performance is that for the event that far from the main event description,  
sentential event recognizer tends to generate low
+
sentential event recognizer tends to generate low probabilities for such sentences.
probabilities for such sentences
+
 
 +
In [[UsesDataset::ProMed]],

Revision as of 04:55, 30 November 2010

Citation

S. Patwardhan and E. Riloff. A unified model of phrasal and sentential evidence for information extraction. in EMNLP 2009

Online version

Unified model for IE

Summary

Previous IE systems make decision only based on immediate context around a phrase. The authors argue that for more complex tasks, such as event extraction, a larger field of view is often needed to understand how facts tie together. This paper proposed a new model for event extraction. To determine whether a noun phrase should be extracted as a filler for an event role the new model computes the joint probability that NPi :

  1. appears in an event sentence, and
  2. is a legitimate filler for the event role.

To compute the probability of a sentence describing a relevant event, they use SVM, which is not a probabilistic classifier. The authors used the margin as an indicator of confidence. It worked well for them. Named entities, lexico-syntactic pattern features, sentence length, bag of words, and verb tense are used as features.


To determine whether a noun phrase can be a legitimate filler for a specific type of event role based on its local context, the authors used Naive Bayes classifier. The features include lexical matches, semantic features, and syntactic relations.

To evaluate this new model, the author tested on two datasets, the MUC-4 terrorism corpus and ProMed disease outbreaks corpus, and compared it with 3 baseline systems (two are pattern-based and another is NB classifier without sentence information).

In MUC-4, the unified IE model outperforms 3 baselines and other systems except PerpOrg role. They argued the reason of this inferior performance is that for the event that far from the main event description, sentential event recognizer tends to generate low probabilities for such sentences.

In ProMed,