Mnduong writeup of Brin 1999

This paper introduces an approach to extracting patterns and relations from the Web using very little training data. The method takes as input a small set of example relations. It then searches the Web for occurrences of the same examples. From these occurrences, it extracts similar patterns, which in turn are used to search for more example relations. The process in repeated until enough relations are extracted.
The core of the algorithm lies in the pattern extractor. The method places an emphasis on precision rather than recall, because the size of the Web is large enough to make up for a low recall. Precision is important because having errors in the extracted patterns will cause errors in the relations that are extracted using these patterns. High precision is enforced by rejecting patterns with too low a specificity.
The method was evaluated in the task of extracting (author, title) relations for books, using an initial example set of size 5.
The results of the experiment is not clear. The author took a random sample of 20 books that were extracted, which is by no means a satisfying method. I think it should be possible to verify the resulting relation by using a strict, exact match search engine for books. Unfortunately "the Library of Congress search system was down at the time of these tests".

Navigation menu