Inferring the Diffusion and Evolution of Topics in Social Communities
Contents
Citation
Cindy Xide Lin, Qiaozhu Mei, Yunliang Jiang, Jiawei Han, and Shanxiang Qi, "Inferring the Diffusion and Evolution of Topics in Social Communities", Proc. of 2011 ACM SIGKDD Workshop on Social Network Mining and Analysis (SNAKDD'11), San Diego, CA, Aug. 2011.
PDF: [[1]]
Abstract from the paper
The prevailing of Web 2.0 techniques has led to the boom of various online communities, where topics are spreading ubiquitously among user-generated documents. Together with this diffusion process is the content evolution of the topics, where novel contents are introduced in by documents which adopt the topic. Unlike an explicit user behavior (e.g., buying a DVD), both the diffusion paths and the evolutionary process of a topic are implicit, making them much more challenging to be discovered.
In this paper, we aim to simultaneously track the evolution of any arbitrary topic and reveal the latent diffusion paths of that topic in a social community. A novel and principled probabilistic model is proposed which casts our task as an joint inference problem, taking into consideration of textual documents, social influences, and topic evolution in a unified way. Specifically, a mixture model is introduced to model the generation of text according to the diffusion and the evolution of the topic, while the whole diffusion process is regularized with user-level social influences through a Gaussian Markov Random Field.
Experiments on both synthetic data and real world data show that the discovery of topic diffusion and evolution benefits from this joint inference; and the probabilistic model we propose performs significantly better than existing methods.
Summary
The author developed a statistical model for Topic-based Information Diffusion and Evolution (TIDE). A mixture model is introduced to model the generation of text according to the diffusion and the evolution of the topic. The whole diffusion process is regularized with user-level social influences through a Gaussian Markov Random Field.
Given a social community, a user-generated document collection, and the primitive topic, two major tasks of tracking the diffusion and evolution of topics are identified:
- Infer the Diffusion Graph
- Track Topic Evolution
Intuitions
- Significant diffusive flow between two documents => Content tend to be highly related
- Diffusion process among documents is regularized by social connections of authors.
- As diffusion proceeds, both semantics and regularization effect evolves over time.
Model
Define G: community, D: documents, a primitive topic, Stream of topics and the diffusion graph.
The first part of the right is "Topic Model", the second part is "Diffusion Model".
Topic Model
Diffusion Model
Make this a Gaussian Markov Random Field.
Experiments
The authors evaluated it on DBLP dataset and Twitter. They use the NetInf Model, IndCas Model and TIDE- model as baseline.
NetInf Model: [[2]]
IndCas Model: [[3]]
TIDE- : Their TIDE model with social regularization step removed.
To evaluate the analysis of Diffusion network, the models are compared using Graph KL-divergence and Graph Cosine Similarity. TIDE method works best on true diffusion graphs.
And the authors also made a case study with two themes: on DBLP and on Twitter. The story shows that their model is reasonable and interpretable.
Related Papers
- Yang, J., and Leskovec, J. 2010. Modeling Information Diffusion in Implicit Networks.
- Kazumi Saito, Masahiro Kimura, Kouzou Ohara, and Hiroshi Motoda. 2010. Selecting information diffusion models over social networks for behavioral analysis. In Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III (ECML PKDD'10), José Luis Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag (Eds.). Springer-Verlag, Berlin, Heidelberg, 180-195.
Study Plan
- Read the related papers of Yang et al. and Kazumi et al.
- Basic graph theory concepts such as Graph KL divergence and Graph Cosine similarity.
- Gaussian Markov Random Field. [[4]]