Yandongl writeup of Jansche 2002

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Jansche_2002_information_extraction_from_voicemail_transcripts by user:Yandongl.


In this paper authors tried to extract caller id and telephone numbers from voicemail transcripts to help one avoid listening through the entire message. The goal is quite similar to Huang (2001), which is frequently cited in this paper, the features are quite different, however.

  • Caller: Positional cues are a strong indicator of caller phrases. Authors applied Col log-linear taggers and the performance is worse than others, but not too much worse. Then after applying it to other corpus, authors claim their approach generalizes well and can work on other corpus, too.
  • Phone number: baseline is to simply find all maximal substrings consisting of spoken digits. Trigrams don't work. Also, numbers might be read in different ways. A 2-step approach is proposed which is: 1. use a hand-crafted grammar to propose candidate phone numbers. 2. only accept candidates of at least length four, which significantly improves precision without degrading recall.

Overall, authors' approach seems to work, which achieves a high F-measure. A common issue: all of these approaches heavily rely on the accuracy of underlying speech recognition component. Also what are suggested in this paper are all heuristic based. E.g. One doesn't have to speak in an expected way (Hi this is...) In addition, corpus of voice mails in different countries or languages can be totally different and it may no longer work for these corpus.