Difference between revisions of "Fader et al EMNLP 2011"
Line 13: | Line 13: | ||
A ''syntactic constraint'' is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A ''lexical constraint'' is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus. | A ''syntactic constraint'' is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A ''lexical constraint'' is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus. | ||
− | === | + | === Brief description of the extraction algorithm === |
The new extraction algorithm takes as input a POS-tagged and NP-chunked sentence and returns a set of <math>\langle x, r, y \rangle</math> extraction triples. Given an input sentence ''s'', the algorithm performs two steps: | The new extraction algorithm takes as input a POS-tagged and NP-chunked sentence and returns a set of <math>\langle x, r, y \rangle</math> extraction triples. Given an input sentence ''s'', the algorithm performs two steps: |
Revision as of 02:07, 30 September 2011
Contents
Citation
Fader, A., Soderland, S. and Etzioni, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing.
Online version
Summary
This paper introduces REVERB, an Open Information Extraction system which outperforms in precision and recall to previous extractors such as TEXTRUNNER and WOE. Two frequent types of errors in previous systems motivated this new extractor: incoherent extractions (where the extracted phrase has no meaningful interpretation) and uninformative extractions (where the extractions omit critical information). REVERB articulates two simple constraints on how binary relationships are expressed to avoid these problems.
A syntactic constraint is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A lexical constraint is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus.
Brief description of the extraction algorithm
The new extraction algorithm takes as input a POS-tagged and NP-chunked sentence and returns a set of extraction triples. Given an input sentence s, the algorithm performs two steps:
- For each verb v in s, find the longest sequence of words rv such that rv starts at v and rv satisfies both the syntactic and the lexical constraints. If any pair of matches are adjacent or overlap in s, merge them into a single match.
- For each relation phrase rv, find the nearest noun phrase x to the left, such that x is not a relative pronoun, WHO-pronoun or existential. Then, find the nearest noun phrase y to the right. For every pair found, return as a valid extraction.
For example, the sentence:
Hudson was born in Hampstead, which is a suburb of London.
returns two valid extractions:
e1: <Hudson, was born in, Hampstead> e2: <Hampstead, is a suburb of, London>
Experimental results
...
Related papers
REVERB is compared against two other open IE systems: TextRunner, described in Banko et al IJCAI 2007 and WOE, presented in Wu and Weld ACL 2010.