Bbd writeup of Bootstrapping Extractors

From Cohen Courses
Revision as of 13:25, 31 May 2016 by WikiAdmin (talk | contribs)
Jump to navigationJump to search

This is a review of Bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:Bbd.

This paper suggests a new approach to get labeled data for information extraction system. They pick a database which has records with entities of interest and corresponding free text corpus. The technique automatically induces a labeling of an input text sequence using a word alignment with a matching database record. They train a CRF for alignment of database records called AlignCRF and another CRF to extract labels out of text called ExtrCRF.

I liked their way of training models. First they estimate parameter of AlignCRF and compute marginal probabilities of labels given data. To predict parameters of ExtrCRF they minimize the KL divergence between probabilities of labels given data from ExtrCRF and those by AlignCRF.

 Reviewer
Bbd writeup of Bootstrapping ExtractorsBbd