Difference between revisions of "Fader et al EMNLP 2011"
Line 9: | Line 9: | ||
== Summary == | == Summary == | ||
− | This [[Category::paper]] introduces REVERB, an [[AddressesProblem::Open Information Extraction]] system which outperforms in precision and recall to previous extractors such as TEXTRUNNER and WOE | + | This [[Category::paper]] introduces REVERB, an [[AddressesProblem::Open Information Extraction]] system which outperforms in precision and recall to previous extractors such as TEXTRUNNER and WOE. Two frequent types of errors in previous systems motivated this new extractor: incoherent extractions (where the extracted phrase has no meaningful interpretation) and uninformative extractions (where the extractions omit critical information). REVERB articulates two simple ''constraints'' on how binary relationships are expressed to avoid these problems. |
A ''syntactic constraint'' is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A ''lexical constraint'' is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus. | A ''syntactic constraint'' is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A ''lexical constraint'' is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus. |
Revision as of 01:49, 30 September 2011
Citation
Fader, A., Soderland, S. and Etzioni, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing.
Online version
Summary
This paper introduces REVERB, an Open Information Extraction system which outperforms in precision and recall to previous extractors such as TEXTRUNNER and WOE. Two frequent types of errors in previous systems motivated this new extractor: incoherent extractions (where the extracted phrase has no meaningful interpretation) and uninformative extractions (where the extractions omit critical information). REVERB articulates two simple constraints on how binary relationships are expressed to avoid these problems.
A syntactic constraint is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A lexical constraint is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus.
The new extraction algorithm
Experimental results
...
Related papers
REVERB is compared against two other open IE systems: TextRunner, described in Banko et al IJCAI 2007 and WOE, presented in Wu and Weld ACL 2010.