Liuliu writeup of Borkar et al.
From Cohen Courses
Jump to navigationJump to searchThis is a review of Borkar 2001 Automatic Segmentation of Text Into Structured Records by user:Liuliu.
I like this paper a lot.
It introduces an enhanced two-level HMM to capture both sequential relationship between elements and sequential relationship between words. It automatically learns the taxonomy using pruning. External knowledge of the domain is creatively integrated into the model by modifying viterbi algorithm
What's more, they gave a very solid evaluation: not only the comparision with other systems, but also qualifying the benifits of each of their enhancements, which is convincing. They also prove the generality of the model by experimenting on another data set.