Difference between revisions of "Hall emnlp2008"
From Cohen Courses
Jump to navigationJump to searchLine 6: | Line 6: | ||
== Summary == | == Summary == | ||
− | This paper uses topic models to study the development of ideas over time for | + | This [[Category::paper]] uses topic models to study the development of ideas over time for |
papers in computational linguistics conferences (ACL, COOLING, EMNLP, etc.) | papers in computational linguistics conferences (ACL, COOLING, EMNLP, etc.) | ||
== Dataset == | == Dataset == | ||
− | ACL Anthology (~12,500 papers) | + | [[Dataset::ACL Anthology]] (~12,500 papers) |
== Model == | == Model == | ||
− | Instead of using dynamic topic models, they used static | + | Instead of using dynamic topic models, they used static [[UsesMethod::Topic_model]] (vanilla LDA) with post hoc analysis to calculate observed probability of topics in the current year, computed as follows: <br> |
<math> | <math> | ||
\hat{p}(z|y) = \sum_{d:t_d=y} \hat{p}(z|d) \hat{p}(d|y) | \hat{p}(z|y) = \sum_{d:t_d=y} \hat{p}(z|d) \hat{p}(d|y) |
Revision as of 16:25, 1 April 2011
Citation
- Title : Studying the History of Ideas Using Topic Models
- Authors : D. Hall, D. Jurafsky, and C. D. Manning
- Venue : EMNLP 2008
Summary
This paper uses topic models to study the development of ideas over time for papers in computational linguistics conferences (ACL, COOLING, EMNLP, etc.)
Dataset
ACL Anthology (~12,500 papers)
Model
Instead of using dynamic topic models, they used static Topic_model (vanilla LDA) with post hoc analysis to calculate observed probability of topics in the current year, computed as follows:
Experiments
- Ran 100 topics LDA, took relevant 36 topics.
- Seeded words for 10 more topics to improve coverage.
- Used these 36+10 topics as priors for new 100-topics run.
- Picked 43 topics and manually labeled them.
Results
These is only a subset of their results. There are more in the paper.
- Trending topics in the CL community
- Declining topics in the CL community
- NLP applications
They investigated whether CL is becoming more applied over time.
They explored six applicatons : Machine Translation, Spelling Correction, Dialogue Systems, Call Routing, Speech Recognition, and Biomedical