Nschneid writeup of Cohen 2000
From Cohen Courses
Jump to navigationJump to searchThis is Nschneid's review of Cohen_2000_hardening_soft_information_sources
Given an automatically extracted "soft" database of possibly ambiguous referring strings and possibly duplicate entries, the problem is to filter the entries and resolve references to produce a "hard" database in which each entity and relation has a unique representation.
This is done with an MDL-style objective (which is given a probabilistic interpretation in §3, shown to be NP-hard to optimize in §4, and approximated with a greedy algorithm in §5). No experiments are presented.
- Does this approach work well in practice? In particular, how does it handle erroneous entries?