Suranah writeup for Cohen 2000

From Cohen Courses
Jump to: navigation, search

This is a review of the paper Cohen_2000_hardening_soft_information_sources by user:Suranah.

It is a theoretical paper which analyzes the problem of extracting more concrete harder database from a more noisier, less structured and very redundant information obtained from applying information extraction techniques. This problem is modeled in a graph theoretic manner, and is shown to be NP-hard. A greedy alternative is proposed.

I was interested in the set I_pot and how it could be created in different ways, which may or may not have affect on the sub-optimal solution. Also, could varying some weights w based on some heuristic function of merges (during runtime) change the formulation and the solution.