Difference between revisions of "Jansche 2002 Information extraction from voicemail transcripts"

From Cohen Courses
Jump to navigationJump to search
Line 27: Line 27:
 
Instead of using the MaxEnt extractor with ngram features as described in the previous work, this paper first shows some empirical analysis of the data and focuses on heuristics based features with decision tree model.
 
Instead of using the MaxEnt extractor with ngram features as described in the previous work, this paper first shows some empirical analysis of the data and focuses on heuristics based features with decision tree model.
  
== Variations of MeMMs ==
+
== Proprietary Voicemail Transcription Dataset ==
The paper produces several variations of the basic MeMM architecture explained above:
+
The paper uses a proprietary data set consisting of almost 10,000 voicemail messages with manual transcription and marks. As illustrated in the following excerpt.
  
* Factored state representation
+
<blockquote><greeting> hi Jane </greeting> <caller> this is Pat Caller </caller> I just wanted to I know you’ve probably seen this or maybe you already know about it . . . so if you could give me a call at <telno> one two three four five </telno> when you get the message I’d like to chat about it hope things are well with you <closing> talk to you soon </closing></blockquote>
To deal with data sparseness problem in standard MeMMs (due to <math>O(|S|^2)</math> transition parameters), one can avoid having S different transition functions (one for each state), and just maintain one function, which uses information about the previous state as features. This reduces the expressive power of the model but allows sharing of information across states and alleviates sparseness problems.
 
 
 
* Observations in states instead of transitions
 
Rather than combining transition and emission parameters into a single function, one could represent the transition probabilities as a standard multinomial, and P(S|O) using a Maxent model. This may also help with sparseness.
 
 
 
* Environmental model for reinforcement learning
 
The transition function can also include an action, resulting in a model suitable for representing the environment of a reinforcement agent.
 
  
 
== Experiments ==
 
== Experiments ==

Revision as of 21:28, 23 October 2010

Citation

Jansche, M. and Abney, S. 2002. Information Extraction from Voicemail Transcripts. In Proceedings of ACL-EMNLP.

Online version

An online version of this paper is available [1].

Summary

This paper introduces a simple yet effective way to extract Caller Phrase/Name and Phone Number from the voicemail transcripts. The author presents the detailed empirical results and statistics drawn from the corpus.

Key Contributions

The paper presents the following key findings between the trade-off of heuristic based simple classifier and ngram based sophisticated classifier

  • The second one might have serious over fitting problems and prone to errors in unseen values of attributes (for example in ASR outputs)
  • The first one exploits both the linguistics intuitions and empirical distributions thus is able to rely on strong heuristics and simple classifier, and the paper introduces an interesting two phrases approach on such learning problem

Introduction

The paper first introduces the major problem of information extraction for voicemail is to identify the caller and a call back number if available. This, if not extracted, will take a person 36 seconds in average since she/he has to listen to the whole voicemail.

This paper focuses on only the transcribed voicemail text instead of including a speech recognition front-end. However, it still differs from traditional Named Entity Recognition and Phone Number Extraction tasks for two major reasons: one is that the voicemail transcript is text based on spoken language thus the linguistic elements are different (for example, 400-1425 can be referred to as four zero zero fourteen twenty two), the other is the structure of voicemail can be exploited to better extract caller identifications without sophisticated structured model for NER.

Instead of using the MaxEnt extractor with ngram features as described in the previous work, this paper first shows some empirical analysis of the data and focuses on heuristics based features with decision tree model.

Proprietary Voicemail Transcription Dataset

The paper uses a proprietary data set consisting of almost 10,000 voicemail messages with manual transcription and marks. As illustrated in the following excerpt.

<greeting> hi Jane </greeting> <caller> this is Pat Caller </caller> I just wanted to I know you’ve probably seen this or maybe you already know about it . . . so if you could give me a call at <telno> one two three four five </telno> when you get the message I’d like to chat about it hope things are well with you <closing> talk to you soon </closing>

Experiments

The authors trained 4 different types of models to classify lines from Usenet files into one of 4 categories: head, question, answer and tail. They used a set of 24 boolean features. The types of models they trained were: ME-Stateless (non-sequential Maxent), TokenHMM (a standard 4-state fully connected HMM), FeatureHMM (an HMM where the lines i.e. obsevations were replaced by their corresponding features), and the MeMM model described above. They found that the MeMM outperformed the other approaches.

Related papers

The Huang et al., 2001 paper discussed a very similar problem but rather with a traditional perspective, it studied three approaches: hand-crafted rules, grammatical inference of sub-sequential transducers and log-linear classifier with bi-gram and tri-gram features, which is essentially the same as in Ratnaparkhi, 1996 paper on Maxent POS tagging.