Jansche, ACL 2002

From Cohen Courses
Jump to navigationJump to search

Citation

Jansche M. and Abney S. P. 2002. Information extraction from voicemail transcripts. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, 2002.

Online version

CiteSeer

Summary

This paper studies a specific task of extracting names and phone numbers from voice mail transcripts. The main ideas are:

  • To use a small number of simple features (see below) to classify (decision trees) the caller phrases and names.
 Features: common words like 'hi', 'this', 'is'; positional information, length of the name
  • Caller name extraction is shown to be a easier task than the caller phrase extraction.
  • To use a two step procedure to extract phone numbers: 1) to convert numbers into digit forms and select candidate numbers with a rule based system (but no details about the rules were mentioned); 2) to classify with decision trees using length, distance to the end of the message, and lexical cues in the context.

The approach was shown to be generalize better than a previous paper Huang ACL 2001 on ASR data set.

Questions

Since it works with a small number of features in this task, why not use more features on SVM? Of course, we should avoid features like n-gram or names to extract caller names as paper suggested.

Also more details should be available for the rule based system that selects the phone number candidates. Maybe they only do the simplest thing which is to output all consecutive numbers.