Nschneid writeup of Cohen 2000

From Cohen Courses
Jump to navigationJump to search

This is Nschneid's review of Cohen_2000_hardening_soft_information_sources

Given an automatically extracted "soft" database of possibly ambiguous referring strings and possibly duplicate entries, the problem is to filter the entries and resolve references to produce a "hard" database in which each entity and relation has a unique representation.

This is done with an MDL-style objective (which is given a probabilistic interpretation in §3, shown to be NP-hard to optimize in §4, and approximated with a greedy algorithm in §5). No experiments are presented.

  • Does this approach work well in practice? In particular, how does it handle erroneous entries?