Difference between revisions of "Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW"
(Created page with '== Citation == Mei, Qiaozhu, Chao Liu, Hang Su, and ChengXiang Zhai. "A probabilistic approach to spatiotemporal theme pattern mining on weblogs." In Proceedings of the 15th inte…') |
|||
Line 27: | Line 27: | ||
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022. | Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022. | ||
+ | |||
Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999. | Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999. | ||
Latest revision as of 09:38, 6 November 2012
Contents
Citation
Mei, Qiaozhu, Chao Liu, Hang Su, and ChengXiang Zhai. "A probabilistic approach to spatiotemporal theme pattern mining on weblogs." In Proceedings of the 15th international conference on World Wide Web, pp. 533-542. ACM, 2006.
Online version
Summary
This paper aims to analyze webblogs by analyzing their spatiotemporal petterns. (Influence of non-linguistic factors over language usage). In particular, it introduces a probabilistic approach to model the subtopic themes and spatiotemporal theme patterns together for weblogs. The proposed method can extract themes (topics) from weblogs, generating theme life cycles for each given location and generating theme snapshots for each given time period.
The method introduced in this paper have multiple applications. For example, weblogs summarization, public opinion monitoring, web analysis and business intelligence.
Data
In this paper, the authors collect three dataset, i.e Hurricane Katrina,Hurricane Rita,IPod Nano. However, none of the data sets can be found on the Internet.
Discussion
The methods used in this paper is similar to the one Probabilistic latent semantic indexing. It is simpler than LDA type methods but also suffer from some weakness of pLSI, such as the unber of parameters in the model grows linearly with the size of the corpus, and cannot be applied to a document outside the training data.
Analysis and Results
The experiments show that their algorithm is pretty good at finding subtopics indicating different interest to a certain event and also show the time cycles and location changes of those subtopics.
Related Paper
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.
Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999.
Study Plan
To understand this paper you might want to read
- this seminal paper on Latent Dirichlet Allocation