CcLDA, Paul and Girju 2009

From Cohen Courses
Jump to navigationJump to search


M. Paul and R. Girju. Cross-cultural analysis of blogs and forums with mixed-collection topic models. In EMNLP ’09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1408–1417, Morristown, NJ, USA, 2009. Association for Computational Linguistics.

Abstract from paper

This paper presents preliminary results on the detection of cultural differences from people’s experiences in various countries from two perspectives: tourists and locals. Our approach is to develop probabilistic models that would provide a good framework for such studies. Thus, we propose here a new model, ccLDA, which extends over the Latent Dirichlet Allocation (LDA) (Blei et al., 2003) and cross- collection mixture (ccMix) (Zhai et al., 2004) models on blogs and forums. We also provide a qualitative and quantitative analysis of the model on the cross-cultural data.


This paper presents a topic model to model a document collection with different collections. For every extracted topic the following is estimated:

  • shared distribution (reflecting what is shared by all collections).
  • for every collection, a collection specific distribution.

External link: [1]