Mnduong writeup of Bellare & McCallum 2009

From Cohen Courses
Jump to navigationJump to search

This is a review of Bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:mnduong.

  • This paper introduces a bootstrapping method for information extraction, utilizing an existing database of similar record types. The method first labels the text by aligning text with the database records. It then trains a text extractor using this induced labeling.
  • The word alignment process is done with a conditional random field, conditioned on both the record and the text sequence. This model also doesn't require labeled word alignments, relying on generalized expectation criteria to get model expectations that are close to target expectations. Criteria can be global or specific to the local sequence pair. The extraction process is also done using a conditional random field.
  • The alignment model was shown to outperform alternatives such as IBM Model 4 and HMM. The extraction model also outperformed state-of-the-art extractors, reducing the error significantly at the 0.005 level.