Liuy writeup of Bunescu 2007
This is a review of Bunescu_2007_learning_to_extract_relations_from_the_web_using_minimal_supervision by user:Liuy.
This paper tries to utilize web information for relation extraction, with a small number of training examples. By knowing a particular relation exhibited or not exhibited by a set of named entity pairs, their system is able to extract sentences having the pairs. Their experiments on the corporate acquisition test set and the person-birthplace test set and shows reliability of their approach in extracting relations from web documents, compared with the four baseline systems.
I like that they use RE kernel is used to resolve weaker supervision problem. The kernel basically calculates the number of common subsequences of tokens between two sentences. In particular, they modify the kernel to allow it to ignore subsequence patterns containing only stop words and punctuation signs.
I also like that their work on reducing the two types of bias. The first type of bias is an overweighting that is caused by giving too much weight to words or the combination of them that are correlated with individual arguments of a relation instance. The second type of bias is due to words specific to the relation instance. For the first type of bias, they assign multiplicative weight for each token in the sequence. Decreasing in weight for a word shows the degree of the correlation between the word and the arguments.
I am concerned with the following problems :
First, the solution for type I bias is partial. Second, they do not have a solution for type II bias.
For either case, they do not provide a statistical proof that their strategy can reduce the bias with high probability
for certain data distribution.