Selen writeup of Bunescu 2007
This is a review of Bunescu_2007_learning_to_extract_relations_from_the_web_using_minimal_supervision by user:Selen.
This paper authors develop a new approach to extract relations without having to need large training datasets. This is a very important goal, especially in biomedical datasets labeling a large training dataset is very consuming yet it may yield to overfitting. They tackle this problem using multiple instance learning however to avoid having low number of "training bags" but high number of examples, that might yield to two different biases, they give predefined weights to words and they change the formulization of svms.
This is one of the few papers I liked, it would be interesting to see how they compare against bootstrapping methods.