Rbosaghz writeup of Jansche and Abney

From Cohen Courses
Revision as of 11:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Jansche_2002_information_extraction_from_voicemail_transcripts by user:Rbosaghz.

This paper discusses methods for extracting phone numbers and the caller name from a voicemail. The task is similar to doing named entity recognition on broadcast news, with the difference that voicemail and broadcast news are very different domains.

They describe a phone-number extraction system which uses hard-coded rules to extract candidates, then use a classifier to filter out the undesirable numbers.

They use a corpus of 10000 voicemails which were manually transcribed, avoiding the automatic speech recognition task, which is daunting in itself. Their classifier uses a log-linear model taking into account the position of a named-entity (callers usually introduce themselves at the beginning of a message).

This paper was not particularly exciting for me, but it was likely novel in 2002.