Bbd writeup of hardening soft information sources

From Cohen Courses
Jump to navigationJump to search

This is a review of Cohen_2000_hardening_soft_information_sources by user:Bbd.

This paper addresses a problems related to soft databases which are created by heuristically extracting information from various sources and may have inconsistencies and duplication. They present a formal model of soft database as a noisy version of hard database. They then infer the most likely hard database given a particular soft database.

They define soft database as set of instances of a fixed set of relations over a set of references inferred from the data. The hardening determines co-reference relations between references in soft database.

I liked the efficient greedy implementation they proposed for the NP-hard problem of finding optimal hard database.