Difference between revisions of "Hall emnlp2008"
From Cohen Courses
Jump to navigationJump to searchLine 1: | Line 1: | ||
− | == | + | == Citation == |
* Title : Studying the History of Ideas Using Topic Models | * Title : Studying the History of Ideas Using Topic Models | ||
Line 13: | Line 13: | ||
== Model == | == Model == | ||
− | LDA with post hoc analysis to calculate observed probability of topics in the current year <br> | + | Instead of using dynamic topic models, they used static topics models (vanilla LDA) with post hoc analysis to calculate observed probability of topics in the current year, computed as follows: <br> |
<math> | <math> | ||
\hat{p}(z|y) = \sum_{d:t_d=y} \hat{p}(z|d) \hat{p}(d|y) | \hat{p}(z|y) = \sum_{d:t_d=y} \hat{p}(z|d) \hat{p}(d|y) | ||
Line 25: | Line 25: | ||
== Results == | == Results == | ||
+ | These is only a subset of their results. There are more in the paper. | ||
+ | |||
* Trending topics in the CL community<br> | * Trending topics in the CL community<br> | ||
[[File:halltrend.png]] | [[File:halltrend.png]] | ||
Line 35: | Line 37: | ||
They explored six applicatons : Machine Translation, Spelling Correction, Dialogue Systems, Call Routing, Speech Recognition, and Biomedical <br> | They explored six applicatons : Machine Translation, Spelling Correction, Dialogue Systems, Call Routing, Speech Recognition, and Biomedical <br> | ||
[[File:hallapp.png]] | [[File:hallapp.png]] | ||
− | |||
− | |||
− |
Revision as of 15:27, 1 April 2011
Citation
- Title : Studying the History of Ideas Using Topic Models
- Authors : D. Hall, D. Jurafsky, and C. D. Manning
- Venue : EMNLP 2008
Summary
This paper uses topic models to study the development of ideas over time for papers in computational linguistics conferences (ACL, COOLING, EMNLP, etc.)
Dataset
ACL Anthology (~12,500 papers)
Model
Instead of using dynamic topic models, they used static topics models (vanilla LDA) with post hoc analysis to calculate observed probability of topics in the current year, computed as follows:
Experiments
- Ran 100 topics LDA, took relevant 36 topics.
- Seeded words for 10 more topics to improve coverage.
- Used these 36+10 topics as priors for new 100-topics run.
- Picked 43 topics and manually labeled them.
Results
These is only a subset of their results. There are more in the paper.
- Trending topics in the CL community
- Declining topics in the CL community
- NLP applications
They investigated whether CL is becoming more applied over time.
They explored six applicatons : Machine Translation, Spelling Correction, Dialogue Systems, Call Routing, Speech Recognition, and Biomedical