Bbd writeup of Borkar 2001

From Cohen Courses
Revision as of 01:16, 21 September 2009 by Bbd (talk | contribs) (→‎I liked)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Borkar_2001_Automatic_Segmentation_of_Text_Into_Structured_Records by user:bbd.

This paper presents a HMM based technique for segmenting unformatted text like addresses, citation data into structured information.

I liked

  • I liked the way they solve the problem in 2 level hierarchical HMM. Outer HMM capture sequencing relationship between elements and inner HMM learn finer structure within elements. Use of HMMs make model robust to changes in data, efficient and easy to understand and tune.
  • They have also suggested modification to Viterbi algorithm to incarporate external database into HMM.