Difference between revisions of "Apappu writeup on Banko et al."

Latest revision as of 10:42, 3 September 2010

This paper talks about an Open domain IE system and its comparison with a state of art closed IE system KnowItAll.

Authors talk about scalable and computationally efficient way to extracte relations from WEB. In this process, they describe three essential components of their system, namely,

Self-Supervised Learner: uses a dependency parser to identify trustworthy relations to label them as positive examples and rest as negative (co-training ?? ). They employ certain heuristics to decide what a trustworthy relation would look like.

followed by Single-Pass Extractor: that tags words with POS and filters non-essential phrases (like prepositional). Finally, each candidate tuple is passed on to classifier.

then there is a Redundancy based Assessor: which puts the similar tuples into equivalence (normalized) bins based on the arguments and predicates.

To estimate the correctness of the facts authors manually looked into the extracted tuples and classified them based on "well-formed"ness.

Then, they talk about how to estimate distinct number of facts from this humongous amount of relations. This seems to be little improbable task given that they don't have much information about [co-reference/spelling-variants/metonyms] of the "arguments" and various senses of predicate phrases.

Revision as of 11:28, 4 November 2009 (view source) Apappu (talk \| contribs)	Latest revision as of 10:42, 3 September 2010 (view source) WikiAdmin (talk \| contribs) m (1 revision)
(No difference)