Suranah wrietup for Bellare 2009

From Cohen Courses
Jump to navigationJump to search

This is a review of Bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:Suranah.

One of the more interesting papers I have reviewed for the course. Instead of relying on manually annotated data, the authors try to exploit availability of parallel structured db and related input texts. The alignments are learned through a model similar to IBM Model 1, which uses CRF with several alignment constraints. It is interesting to see how this beats IBM Model 4. One possible reason for the performance could be their use of string similarity methods.

These alignments are then used to learn an extraction classifier. The extraction classifier does not decrease in performance over alignment algorithm though test records are not used during the decoding. The paper has rather extensive experiments (for the limited space), and results are encouraging.

I find this approach to be quite unique. But on a general note, I find it strange that the authors do not discuss broader implications of this approach for other IE problems (aligning Freebase data to Wiki introductions, and using it to learn extractors). Is it possible that there are not many readily available instances of such parallel data that their approach demands? Or the alignments maybe significantly worse for more free input text.