Difference between revisions of "Bbd writeup of Bootstrapping Extractors"
m (1 revision) |
|
(No difference)
|
Revision as of 10:42, 3 September 2010
This is a review of Bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:Bbd.
This paper suggests a new approach to get labeled data for information extraction system. They pick a database which has records with entities of interest and corresponding free text corpus. The technique automatically induces a labeling of an input text sequence using a word alignment with a matching database record. They train a CRF for alignment of database records called AlignCRF and another CRF to extract labels out of text called ExtrCRF.
I liked their way of training models. First they estimate parameter of AlignCRF and compute marginal probabilities of labels given data. To predict parameters of ExtrCRF they minimize the KL divergence between probabilities of labels given data from ExtrCRF and those by AlignCRF.