Difference between revisions of "Mintz et al., ACL-IJCNLP 2009"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. ACL '09 Proceedings of the Joint Confer…')
 
Line 24: Line 24:
  
 
They also sampled from their predicted relation instances and had humans evaluate the precision on Mechanical Turk.  Doing this they found that they had about a 68% precision.
 
They also sampled from their predicted relation instances and had humans evaluate the precision on Mechanical Turk.  Doing this they found that they had about a 68% precision.
 +
 +
== Related Papers ==
 +
 +
[[RelatedPaper::Rahman and Ng, ACL 2011]]
 +
 +
[[RelatedPaper::Hoffmann et al., ACL 2011]]

Revision as of 23:47, 29 September 2011

Citation

Mike Mintz, Steven Bills, Rion Snow, Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2

Online Version

http://dl.acm.org/citation.cfm?id=1690287

Summary

This Paper addresses the problem of Relation Extraction; that is, given some corpus of unstructured text, they want to produce a set of relations involving entities that appear in the corpus. Their contribution is a novel way of using a knowledge base to perform this task. They use information from the knowledge base to learn weights in a classifier that maps entity pairs to relations, using aggregated lexical and syntactic cues as features. (This is very similar to Hoffmann et al., ACL 2011 - indeed Hoffmann et al. cite this paper as inspiration, saying that they are fixing some problems with the methods in this paper.)

Methods

This paper uses Freebase to get training data for a classifier that maps entity pairs to relations. They take a few hundred thousand instances of relations from Freebase, find the entities involved in a pile of unstructured text, then assume that the sentences they found probably encode the relation in some way. That's certainly not true for every sentence, but if you average over millions of sentences, you should be fine. Once they have these sentences, they extract features from them, aggregate them over the whole unlabeled corpus, and use them to learn a classifier.

They have two main kinds of features that they extract from individual sentences: lexical and syntactic. Lexical features look simply at the words in between the two entities, taking the entire phrase, including the entities themselves, as the feature. These are very specific features, but because they have so much unlabeled data, sparsity isn't as much of an issue. Syntactic features look at the path between the two entities in a dependency parse, including a window on either side. Both sets of features are aggregated over the whole corpus, and the classifier is learned from the data.

At testing time, they find entity mentions, do a dependency parse of the text, and aggregate features the same way as they did before. Then for each pair of entities that were mentioned together in a sentence, they run their classifier to predict which relation the two entities participate in. This limits them to a single relation per entity pair (a deficiency addressed in Hoffmann et al., ACL 2011), but they still are able to get a decent recall.

Experimental Results

They evaluate their data by holding out half of the relation instances from Freebase, and seeing how many of them they can predict from the text. It's not a perfect evaluation , as presumably not all of the relation instances will be manifested in their test corpus, and not all of the true relation instances in the corpus will be found in Freebase, but it provides a rough approximation. They found that they could get 80% precision when they had 5% recall, dropping to about 55% precision with a recall of 25%.

They also sampled from their predicted relation instances and had humans evaluate the precision on Mechanical Turk. Doing this they found that they had about a 68% precision.

Related Papers

Rahman and Ng, ACL 2011

Hoffmann et al., ACL 2011