Mnduong writeup of Cohen et al. KDD '00

From Cohen Courses
Jump to navigationJump to search

This is a review of Cohen_2000_hardening_soft_information_sources by user:mnduong.

My questions for this paper are:

  • In the cost function, it seems to me that the size of the set I is being doubly penalized, through w(I) and |I|. For example, examples with zero cost, such as William -> William, would still be penalized because they increase the size of I. It seems that minimizing the sum of |I(S)| and w(I) is sufficient.
  • In section 3, formula (2) implicitly assumes that I and H are independent. This is probably true, but perhaps requires some reasoning. (I'm not quite sure about the relationship - one thing that seems obvious is that given S, I and H are definitely dependent, because I(S) = H.)