Hoffmann et al., ACL 2011
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, Daniel S. Weld. 2011. Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations. In Proceedings of the 49th annual meeting of the Association for Computational Linguistics, 541-550.
This Paper addresses the problem of Relation Extraction; that is, given some corpus of unstructured text, they want to produce a set of relations involving entities that appear in the corpus. Their contribution is a novel way of using a knowledge base to perform this task. They use information from the knowledge base to learn weights in a classifier that maps sentences to relations over entities.
They use Conditional Random Fields to model the possible facts in a database and their relationships to the corpus. They use some amount of known relations from Freebase to learn the weights for the factors in their CRF, then use the learned model on new sentences (and with more entities? that part wasn't exactly clear to me) to extract more relations.
An interesting aspect of their paper is that they distinguish between "aggregate extraction" and "sentential extraction." Aggregate extraction is predicting relations given the entire corpus; sentential extraction is predicting those relations along with the individual sentences that support the prediction. They try to do both in their paper, but most of their work is on sentential extraction, getting aggregate results only by a simple OR over all sentences. That perhaps puts a little too much confidence in the sentence extractors, and some aggregate classifier might be more appropriate. But they do it that way because it greatly simplifies learning and inference, and still gives them better performance than previous methods.
Their system did fairly well, achieving a very high precision (100% for many relation types), though with a relatively low recall (their best was 56%, average around 15%).
This paper is related to Rahman and Ng, ACL 2011 in that both try to improve performance on a structured prediction problem by using a knowledge base as input to their algorithms.