Wka writeup of Bellare and McCallum 2009
From Cohen Courses
Jump to navigationJump to searchThis is a review of bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:wka.
Use data in DB to annotate text. CRF aligns tokens in DB with their occurrences in text. Resulting annotation used to train extractor.
CRF
- Feature vector:
- alignment features: on source-target tokens
- extraction features: on source labels and target text
L-BFGS optimization, non-convex, but local optima are fine. Convex for ExtrCRF.
ExtrCRF (first-order model) as accurate as AlignCRF (zero-order model) without access to DB data.
- Error reduction over previous state-of-the-art by 31%