Wka writeup of Cohen and Carvalho 2005
From Cohen Courses
Jump to navigationJump to searchThis is a review of cohen_2000_hardening_soft_information_sources by user:wka.
In a soft db distinct identifiers may refer to same entity. To hardening the db is to determine which pairs of identifiers refer to same real-world objects. The paper casts hardening as an optimization problem: that of minimizing sum of number of hard tuples in db + cost of co-reference assumption. Finding an optimal hardening is NP-hard, but a greedy algorithm is presented that gives a good hardening in almost linear time. The cost objective function to minimize is derived probabilistically.
- It would have been good to include some experimental results (on, is there a standard dataset for this kind of problem?).