KeisukeKamataki writeup of Bunescu 2007
This is a review of Bunescu_2007_learning_to_extract_relations_from_the_web_using_minimal_supervision by user:KeisukeKamataki.
Summary: They tried to extract pair relations of Corporate Aquisition and Person-Birth place from web documents. They considered this problem as a part of MIL(Multiple Instance Learning) where the decision function is expressed in terms of kernels computed between bag instances so that they can apply kernel-based SVM. The kernel they considered was the modifid version of the subsequence kernel. Specifically, the kernel doesn't require syntactic information and consider the feature space of sequence of words itself. They also computed the weight of each word according to the correlation of between the words in a sentence and the arguments to augument the subsequent kernel. For experiment, they only considered sentences which include both argument words of the relation they were interested in to simplify the problem. The method SSK-T1 (subsequent kernel augmented by word weight) showed the best performance.
I like: Their experiment methodology especially for extraction method comparison looks make sense. The result tells us not only the effectiveness of their own approach, but also the traditional approach which requires much annotation effort.