Sgardine writeup of Jansche and Abney

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Jansche_2002_information_extraction_from_voicemail_transcripts by user:sgardine.

The empirical examinations of the voicemail corpus corroborated a common-sense observation that callers tend to identify themselves briefly and at the beginning of the message. A simple classifier based on these observations was competitive with the previous state-of-the-art tuned log-linear model, and outperformed the log-linear on novel automatically-recognized messaged. The authors speculate that the log-linear models are over-fitting the domain, i.e. the caller-identification observations generalize better to the new ASR distribution. If we're looking only for minimizing error rate, it seems like the log-linear model could easily incorporate the JA method as just another feature, though JA is probably simpler and faster.

A two-pass extraction of phone-numbers involved a hand-crafted grammar to convert sequences of number words to sequences of digits (e.g. "six hundred" to "600") and then classifying the digit sequences. A few simple features including triggers and length of number sufficed to outperform the previous classifiers, and the results held up in the ASR distribution as well.

The relationship between the ASR and transcribed data was unclear. Presumably the underlying voice messages are drawn from the same distribution (i.e. the corpus used in the Bacchiani paper for building the ASR system), but it seems like the mistakes of ASR and transcription are distinct. Maybe all that this paper does is delineate some features that survive the noise of ASR better. It would have been interesting to train some of the models on ASR data and compare directly differences due to only that dimension — though that involves some (potentially noisy) work in transferring the labelling on the transcribed data to the ASR messages.

It seems to me like the features used for caller extraction here (early and briefly) would generalize to voicemails in other languages and cultures. The heuristics about phone number length (and even to a lesser extent caller name length) would have to be re-thought for each new setting outside of North America.