Fader et al EMNLP 2011

From Cohen Courses
Revision as of 01:49, 30 September 2011 by Aanavas (talk | contribs) (→‎Summary)
Jump to navigationJump to search

Citation

Fader, A., Soderland, S. and Etzioni, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing.

Online version

University of Washington

Summary

This paper introduces REVERB, an Open Information Extraction system which outperforms in precision and recall to previous extractors such as TEXTRUNNER and WOE. Two frequent types of errors in previous systems motivated this new extractor: incoherent extractions (where the extracted phrase has no meaningful interpretation) and uninformative extractions (where the extractions omit critical information). REVERB articulates two simple constraints on how binary relationships are expressed to avoid these problems.

A syntactic constraint is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A lexical constraint is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus.

The new extraction algorithm

Experimental results

...

Related papers

REVERB is compared against two other open IE systems: TextRunner, described in Banko et al IJCAI 2007 and WOE, presented in Wu and Weld ACL 2010.