Nschneid writeup of Cohen 2000

This is Nschneid's review of Cohen_2000_hardening_soft_information_sources

Given an automatically extracted "soft" database of possibly ambiguous referring strings and possibly duplicate entries, the problem is to filter the entries and resolve references to produce a "hard" database in which each entity and relation has a unique representation.

This is done with an MDL-style objective (which is given a probabilistic interpretation in §3, shown to be NP-hard to optimize in §4, and approximated with a greedy algorithm in §5). No experiments are presented.

Does this approach work well in practice? In particular, how does it handle erroneous entries?

Nschneid writeup of Cohen 2000

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools