Rbosaghz writeup of Borkar et. al. 2001

From Cohen Courses
Jump to navigationJump to search

This is review of Borkar_2001_Automatic_Segmentation_of_Text_Into_Structured_Records by user:Rbosaghz.

In this paper the authors build a novel method for automatically segmenting unformatted text records into structured elements. This is a very general problem, and can include as diverse sources of unformatted text as addresses, bibliography records, classified ads, etc. They use HMMs with the usual Viterbi algorithm for decoding. The intuitive modeling setup is to have each candidate tag be a hidden state and the observed states be the seen unformatted text. This paper was novel in that it was leaving behind the usual rule-based approaches to solving this task.