Mnduong project abstract

From Cohen Courses
Jump to navigationJump to search

Named Entity Recognition in Noisy Speech Output

  • In this project, I plan to work on the problem of named entity recognition on noisy speech recognizer output. I will focus on extracting entities that usually come from out of vocabulary, which are more prone to ASR errors, such as proper names. We plan to use a broadcast news dataset from the Linguistic Data Consortium, which is available to us at CMU. We haven't decided upon which specific dataset to use, but we know for sure that there are appropriate transcribed data that we can use.
  • I think this problem is interesting because it's quite different from the traditional problem of recognizing named entities from clean texts. With the noisy ASR hypothesis, we cannot rely on word identity and orthographic features extracted from the word. Thus, we will have to come up with more robust features of the surrounding context, as well as the acoustic confidence of each word. The result of this system will potentially be useful for detecting errors in speech recognition, by comparing the best label given by the system with the ASR hypothesis. In the bigger problem of recognizing out-of-vocabulary words, this information can be used to adapt the language model used by the ASR to result in a better rerun of the ASR.
  • My current research is on analyzing children prosody, which requires dealing with ASR output. This gives me additional motivation for solving the problem. I also have prior experience with named entity recognition with clean texts, which would help draw a comparison between the work required by the two problems.
  • I plan to first do an implementation of a good published method to use as the baseline. From there, hopefully we will come up with ways to improve it, either with feature engineering or with the model. We are still surveying the literature to search for good known methods. Once we know what has already been explored, we will have a better idea of what to try. To obtain labels for the training data, we will first run a vanilla NER model on the transcribed version of the data (which doesn't have punctuation and capitalization information). Doing this will give labels at almost the same accuracy level as that obtained from running on data with punctuation and capitalization, as shown by an early work (Miller et al. '97).
  • From this project, we want to know how different it is to do named entity recognition on noisy speech recognizer output: what kind of features will help/hurt, how different should the model structure be. We are also interested in how to incorporate acoustic confidence information into the model.
  • I plan to work with Aasish Pappu on this project.

References

  • David Miller, Scan Boisen, Richard Schwartz, Rebecca Stone, Ralph Weischedel. Named Entity Extraction from Noisy Input: Speech and OCR. ACL 1997
  • David D. Palmer, Mari Ostendorf, John D. Burger. Robust information extraction from automatically generated speech transcriptions. Speech Communication 32 (2000) 95-109
  • David D. Palmer, Mari Ostendorf. Improving Information Extraction by Modeling Errors in Speech Recognizer Output. Proceedings of the first international conference on Human language technology research, 2001.
  • Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki. Named Entity Recognition from Speech Using Discriminative Models and Speech Recognition Confidence. Journal of Information Processing Vol 17, 72-81 (Feb 2009)