Wka writeup of Bellare and McCallum 2009

From Cohen Courses
Jump to navigationJump to search

This is a review of bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:wka.

Use data in DB to annotate text. CRF aligns tokens in DB with their occurrences in text. Resulting annotation used to train extractor.

CRF

  • Feature vector:
    • alignment features: on source-target tokens
    • extraction features: on source labels and target text

L-BFGS optimization, non-convex, but local optima are fine. Convex for ExtrCRF.

ExtrCRF (first-order model) as accurate as AlignCRF (zero-order model) without access to DB data.

  • Error reduction over previous state-of-the-art by 31%