KeisukeKamataki writeup of Borkar et al 2001

From Cohen Courses
Jump to navigationJump to search

This is a review of Borkar_2001_Automatic_Segmentation_of_Text_Into_Structured_Records by user:KeisukeKamataki.


  • Summary: They tried to extend the basic HMM combining three methods (nested-two level model, modified Viterbi algorithm, and hierarchical feature selection) and applied it for the IE task of address extraction from free text. Nested-two level modeling helps build the natural sequence model taking into account of the property of each "element (i.e. PhoneNo, City name)". Hierarchical feature modeling helps us re-define feature to adopt intuitive/natural representation of features and helps HMM model state sequence naturally. Modifying Viterbi is done in order to handle this new HMM. This method violates the basic assumption of Viterbi, but it allows us find the best path in the context of pre-defined semantics.

  • I like: This is also well written paper. Their approach to define domain-specific semantics of observation patterns and adopt it to HMM sounds reasonable. I feel well defining semantics of data precisely and taking advantage of features from the semantics might be a key idea to build a good IE system.