Suranah writeup for Cohen 2000

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of the paper Cohen_2000_hardening_soft_information_sources by user:Suranah.

It is a theoretical paper which analyzes the problem of extracting more concrete harder database from a more noisier, less structured and very redundant information obtained from applying information extraction techniques. This problem is modeled in a graph theoretic manner, and is shown to be NP-hard. A greedy alternative is proposed.

I was interested in the set I_pot and how it could be created in different ways, which may or may not have affect on the sub-optimal solution. Also, could varying some weights w based on some heuristic function of merges (during runtime) change the formulation and the solution.