Rbosaghz writeup of Jansche and Abney
This is a review of Jansche_2002_information_extraction_from_voicemail_transcripts by user:Rbosaghz.
This paper discusses methods for extracting phone numbers and the caller name from a voicemail. The task is similar to doing named entity recognition on broadcast news, with the difference that voicemail and broadcast news are very different domains.
They describe a phone-number extraction system which uses hard-coded rules to extract candidates, then use a classifier to filter out the undesirable numbers.
They use a corpus of 10000 voicemails which were manually transcribed, avoiding the automatic speech recognition task, which is daunting in itself. Their classifier uses a log-linear model taking into account the position of a named-entity (callers usually introduce themselves at the beginning of a message).
This paper was not particularly exciting for me, but it was likely novel in 2002.