Difference between revisions of "Fader et al EMNLP 2011"

From Cohen Courses
Jump to navigationJump to search
Line 18: Line 18:
 
* For each verb ''v'' in ''s'', find the longest sequence of words ''r<sub>v</sub>'' such that ''r<sub>v</sub>'' starts at ''v'' and ''r<sub>v</sub>'' satisfies both the syntactic and the lexical constraints. If any pair of matches are adjacent or overlap in ''s'', merge them into a single match.
 
* For each verb ''v'' in ''s'', find the longest sequence of words ''r<sub>v</sub>'' such that ''r<sub>v</sub>'' starts at ''v'' and ''r<sub>v</sub>'' satisfies both the syntactic and the lexical constraints. If any pair of matches are adjacent or overlap in ''s'', merge them into a single match.
 
* For each relation phrase ''r<sub>v</sub>'', find the nearest noun phrase ''x'' to the left, such that ''x'' is not a relative pronoun, WHO-pronoun or existential. Then, find the nearest noun phrase ''y'' to the right. For every <math>\langle x, y \rangle</math> pair found, return <math>\langle x, r, y \rangle</math> as a valid extraction.
 
* For each relation phrase ''r<sub>v</sub>'', find the nearest noun phrase ''x'' to the left, such that ''x'' is not a relative pronoun, WHO-pronoun or existential. Then, find the nearest noun phrase ''y'' to the right. For every <math>\langle x, y \rangle</math> pair found, return <math>\langle x, r, y \rangle</math> as a valid extraction.
 +
 +
For example, the sentence:
 +
Hudson was born in Hampstead, which is a suburb of London.
 +
will return two valid extractions:
 +
e1: <Hudson, was born in, Hampstead>
 +
e2: <Hampstead, is a suburb of, London>
  
 
== Experimental results ==
 
== Experimental results ==

Revision as of 02:04, 30 September 2011

Citation

Fader, A., Soderland, S. and Etzioni, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing.

Online version

University of Washington

Summary

This paper introduces REVERB, an Open Information Extraction system which outperforms in precision and recall to previous extractors such as TEXTRUNNER and WOE. Two frequent types of errors in previous systems motivated this new extractor: incoherent extractions (where the extracted phrase has no meaningful interpretation) and uninformative extractions (where the extractions omit critical information). REVERB articulates two simple constraints on how binary relationships are expressed to avoid these problems.

A syntactic constraint is proposed to avoid incoherent relation phrases: a valid relation phrase should be either a verb, a verb followed by a preposition or a verb followed by nouns, adjectives or adverbs ending in a preposition. This constraint also reduces uninformative extractions but sometimes match relation phrases that are too specific and result in few instances. A lexical constraint is introduced to overcome this limitation: a valid relation phrase should take many distinct arguments in a large corpus.

Extraction algorithm

The new extraction algorithm takes as input a POS-tagged and NP-chunked sentence and returns a set of extraction triples. Given an input sentence s, the algorithm performs two steps:

  • For each verb v in s, find the longest sequence of words rv such that rv starts at v and rv satisfies both the syntactic and the lexical constraints. If any pair of matches are adjacent or overlap in s, merge them into a single match.
  • For each relation phrase rv, find the nearest noun phrase x to the left, such that x is not a relative pronoun, WHO-pronoun or existential. Then, find the nearest noun phrase y to the right. For every pair found, return as a valid extraction.

For example, the sentence:

Hudson was born in Hampstead, which is a suburb of London.

will return two valid extractions:

e1: <Hudson, was born in, Hampstead>
e2: <Hampstead, is a suburb of, London>

Experimental results

...

Related papers

REVERB is compared against two other open IE systems: TextRunner, described in Banko et al IJCAI 2007 and WOE, presented in Wu and Weld ACL 2010.