Difference between revisions of "Nschneid writeup of Cohen 2000"

From Cohen Courses
Jump to navigationJump to search
m (1 revision)
 
(No difference)

Latest revision as of 10:42, 3 September 2010

This is Nschneid's review of Cohen_2000_hardening_soft_information_sources

Given an automatically extracted "soft" database of possibly ambiguous referring strings and possibly duplicate entries, the problem is to filter the entries and resolve references to produce a "hard" database in which each entity and relation has a unique representation.

This is done with an MDL-style objective (which is given a probabilistic interpretation in §3, shown to be NP-hard to optimize in §4, and approximated with a greedy algorithm in §5). No experiments are presented.

  • Does this approach work well in practice? In particular, how does it handle erroneous entries?