Rbalasub writeup of Jansche and Abney

From Cohen Courses
Jump to navigationJump to search

A review of Jansche_2002_information_extraction_from_voicemail_transcripts by user:rbalasub

The authors address the task of extracting caller phrases and telephone numbers from voicemail transcripts. They also address a subsidiary task of extracting caller names which are usually contained in caller phrases. For the first task, a simple decision tree learner is trained using features which indicate the word index of the starting position of the caller phrase and the length of the phrase. The emphasis is on using general features and to avoid lexical features which may cause problems when used on automatically generated transcripts. For the second task of extracting phone numbers, a two stage process of first using a grammar to propose candidates and then using a decision tree to weed out incorrect candidates is used. In general, the authors approach the task with a very data driven approach, trying simple heuristics before resorting to heavy weight methods. The approach of using the two stage high recall + filtering out bad matches seems practically appealing.

On the flip side, there is nothing very novel about the approaches. The task is also probably not very relevant in this present age.