Mnduong writeup of Bunescu & Mooney ACL '07

From Cohen Courses
Jump to navigationJump to search

This is a review of Bunescu_2007_learning_to_extract_relations_from_the_web_using_minimal_supervision by user:mnduong.

  • This paper introduces a method to extract relations that doesn't rely on having a large number of labeled examples. From a small set of positive sentences and negative sentences of a given relation, the system uses a web search engine to expand each positive sentence to a large bag of (possibly noisy) positive sentences, and each negative sentence to a large bag of (presumably) negative sentences.
  • The system then trains an SVM to separate the positive examples from the negative ones, with weights set such that false positive errors are penalized more heavily than false negative errors.
  • In the dual representation, the SVM uses a modified version of the subsequence kernels introduced in (Bunescu & Mooney '06). This version adds a weight to each word in the subsequence, to penalize for its correlation with either of the arguments of the relation. This weight is added to avoid putting too much weight on words that are highly correlated with the arguments, but not necessarily with the relation itself.
  • The evaluation shows that the modified kernel improved upon the original subsequence kernel, which in turn performed much better than the baseline bag-of-words kernel. It's also competitive with a method using the subsequence kernel on supervised data, which requires a lot more annotations.
  • I didn't quite understand the authors' description of Type II bias. It seems like the problematic words also have high correlation with the arguments and not with the relation itself, which is similar to the problem in Type I bias. In that case, the weighting scheme that was used to address Type I bias would presumably solve Type II problem as well.
  • The competitive performance against the supervised method is quite impressive. However, I would like to see the result of training on a portion of each of the bags, and test on the rest, instead of choosing separate bags for training and testing and then averaging the results. It would have been a fairer comparison.