Suranah writeup for Cohen 2000
It is a theoretical paper which analyzes the problem of extracting more concrete harder database from a more noisier, less structured and very redundant information obtained from applying information extraction techniques. This problem is modeled in a graph theoretic manner, and is shown to be NP-hard. A greedy alternative is proposed.
I was interested in the set I_pot and how it could be created in different ways, which may or may not have affect on the sub-optimal solution. Also, could varying some weights w based on some heuristic function of merges (during runtime) change the formulation and the solution.