Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW

From Cohen Courses
Jump to navigationJump to search

Citation

Mei, Qiaozhu, Chao Liu, Hang Su, and ChengXiang Zhai. "A probabilistic approach to spatiotemporal theme pattern mining on weblogs." In Proceedings of the 15th international conference on World Wide Web, pp. 533-542. ACM, 2006.

Online version

Pdf of the paper

Summary

This paper aims to analyze webblogs by analyzing their spatiotemporal petterns. (Influence of non-linguistic factors over language usage). In particular, it introduces a probabilistic approach to model the subtopic themes and spatiotemporal theme patterns together for weblogs. The proposed method can extract themes (topics) from weblogs, generating theme life cycles for each given location and generating theme snapshots for each given time period.

The method introduced in this paper have multiple applications. For example, weblogs summarization, public opinion monitoring, web analysis and business intelligence.

Data

In this paper, the authors collect three dataset, i.e Hurricane Katrina,Hurricane Rita,IPod Nano. However, none of the data sets can be found on the Internet.

Discussion

The methods used in this paper is similar to the one Probabilistic latent semantic indexing. It is simpler than LDA type methods but also suffer from some weakness of pLSI, such as the unber of parameters in the model grows linearly with the size of the corpus, and cannot be applied to a document outside the training data.


Analysis and Results

The experiments show that their algorithm is pretty good at finding subtopics indicating different interest to a certain event and also show the time cycles and location changes of those subtopics.

Related Paper

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." the Journal of machine Learning research 3 (2003): 993-1022.

Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999.

Study Plan

To understand this paper you might want to read