Sgopal1 writeup of GE expectation

From Cohen Courses
Jump to navigationJump to search

This is a review of the paper Bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:sgopal1.


This paper presents a method to align text sequences with D.B records. They are able to achieve a significant error reduction by using alignment based CRF model. They define a bunch of features ( alignment and extraction ) that should typically hold in the dataset. An alignment CRF is trained to identify the best alignment sequence ( using viterbi and baum-welch ). They then define an extraction CRF to define a probability distribution over label sequences given a text sequence ( extraction CRF ). Inference is done by minimizing the KL-Divergence between the true and expected values. Some intuition about feature generation is explained. I also like the evaluation and the discussion section. They talk quite a bit about where the other models are stuck and their method does not.