Bbd writeup of Borkar 2001

From Cohen Courses
Jump to navigationJump to search

This is a review of Borkar_2001_Automatic_Segmentation_of_Text_Into_Structured_Records by user:bbd.

This paper presents a HMM based technique for segmenting unformatted text like addresses, citation data into structured information.

I liked

  • I liked the way they solve the problem in 2 level hierarchical HMM. Outer HMM capture sequencing relationship between elements and inner HMM learn finer structure within elements. Use of HMMs make model robust to changes in data, efficient and easy to understand and tune.
  • They have also suggested modification to Viterbi algorithm to incarporate external database into HMM.