Rbalasbu writeup of Borkar et al.

From Cohen Courses
Jump to navigationJump to search

A review of Borkar_2001_Automatic_Segmentation_of_Text_Into_Structured_Records by user:rbalasub

The authors demonstrate a tool DATAMOLD that uses a nested HMM to extract structured field attributes from free text. Results on address decomposition using the technique shows significant improvement over rule based systems that are domain specific. The model proposed by the authors also allows external databases to be utilized for supplying domain knowledge.

The hierarchical HMM idea is very interesting. So is their section on feature selection.