Selen writeup of Bellare and McCallum

From Cohen Courses
Jump to navigationJump to search

This is a review of Bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:Selen.


In this paper, they align structured text with the input text, they basically propose a method to align the words in the text with the records in the database and label the text accordingly. To do that they use CRFs trained with alignment and extraction features, AlignCRF and ExtractCRF. They train and test their method on DBLP records and the text queried with the author's name.

I wonder instead of coming up with AlignCRF what would happen if they had used a distance metric (like Levenstein) and some kind of a string alignment algorithm and scoring the alignment.