Difference between revisions of "Hall emnlp2008"

From Cohen Courses
Jump to navigationJump to search
Line 6: Line 6:
  
 
== Summary ==  
 
== Summary ==  
This paper uses topic models to study the development of ideas over time for  
+
This [[Category::paper]] uses topic models to study the development of ideas over time for  
 
papers in computational linguistics conferences (ACL, COOLING, EMNLP, etc.)
 
papers in computational linguistics conferences (ACL, COOLING, EMNLP, etc.)
  
 
== Dataset ==  
 
== Dataset ==  
ACL Anthology (~12,500 papers)
+
[[Dataset::ACL Anthology]] (~12,500 papers)
  
 
== Model ==  
 
== Model ==  
Instead of using dynamic topic models, they used static topics models (vanilla LDA) with post hoc analysis to calculate observed probability of topics in the current year, computed as follows: <br>
+
Instead of using dynamic topic models, they used static [[UsesMethod::Topic_model]] (vanilla LDA) with post hoc analysis to calculate observed probability of topics in the current year, computed as follows: <br>
 
<math>
 
<math>
 
\hat{p}(z|y) = \sum_{d:t_d=y} \hat{p}(z|d) \hat{p}(d|y)
 
\hat{p}(z|y) = \sum_{d:t_d=y} \hat{p}(z|d) \hat{p}(d|y)

Revision as of 16:25, 1 April 2011

Citation

  • Title : Studying the History of Ideas Using Topic Models
  • Authors : D. Hall, D. Jurafsky, and C. D. Manning
  • Venue : EMNLP 2008

Summary

This paper uses topic models to study the development of ideas over time for papers in computational linguistics conferences (ACL, COOLING, EMNLP, etc.)

Dataset

ACL Anthology (~12,500 papers)

Model

Instead of using dynamic topic models, they used static Topic_model (vanilla LDA) with post hoc analysis to calculate observed probability of topics in the current year, computed as follows:

Experiments

  • Ran 100 topics LDA, took relevant 36 topics.
  • Seeded words for 10 more topics to improve coverage.
  • Used these 36+10 topics as priors for new 100-topics run.
  • Picked 43 topics and manually labeled them.

Results

These is only a subset of their results. There are more in the paper.

  • Trending topics in the CL community

Halltrend.png

  • Declining topics in the CL community

Halltdecline.png

  • NLP applications

They investigated whether CL is becoming more applied over time.
They explored six applicatons : Machine Translation, Spelling Correction, Dialogue Systems, Call Routing, Speech Recognition, and Biomedical
Hallapp.png