Apappu writeup on Bellare and McCullum

From Cohen Courses
Revision as of 13:11, 28 October 2009 by Apappu (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Bellare_2009_generalized_expectation_criteria_for_bootstrapping_extractors_using_record_text_alignment by user:Apappu.

  • CRF based approach to align DB records and their corresponding realization in the running text.
  • This paper talks about a method to annotate input text with labels using a

word alignment mechanism between input text and database record.

  • The word alignment model imitates IBM model 1 where each target token could be mapped to more than one source token.
  • Advantage with a word alignment model is it takes care of discrepancies due to spelling errors, word insertions/deletions and extra fields.
  • Authors propose two types of feature sets and corresponding CRF for each task, one of them addresses alignment problem and the other one deals with extraction features defined on labels and input text. The difference between alignment CRF and extraction CRF is alignCRF is a 0-order model whereas the extrCRF is a first order one.
  • Authors found that alignCRF outperforms generative alignment model of IBM Model4 and HMM alignment model. Especially, when they could consider non-independent features relevant to running text a better performance has been expected.
  • On the other hand, authors showed that their extraction CRF model does better compared to others and previous state of art systems.