Yano et al ICWSM 2010. What’s Worthy of Comment? Content and Comment Volume in Political Blogs

From Cohen Courses
Jump to navigationJump to search

Citation

Tae Yano and Noah A. Smith. What’s Worthy of Comment? Content and Comment Volume in Political Blogs. In Proc of ICWSM 2010.

Online Version

What’s Worthy of Comment? Content and Comment Volume in Political Blogs.

Summary

This Paper describes a topic model based approach in modeling the relationship between the text content of a political blog post and the comment volume (i.e. the total amount of response) that a post will receive. In essence, given a blog post (text content), the paper tries to predict how much comment volume the post will receive, by using topic models.

Brief description of the method

The author's propose a generative model, called the Topic-Poisson model, which proceeds as follows. The number of topic is fixed in advance.

Yano-icwsm-tp.png

Apart from step 2(c), the model is identical to a (smoothed) LDA. is chosen from a mixture of distributions, where are the mixture weights and are Poisson mixture components.

The authors also present a variation of this model called the Topic Negative Binomial model, where the mixture of Poissons is replaced by a mixture of negative binomials.

Experimental Result

Task: Predict whether a blog post will have higher volume than the average seen in training data (Note that they are NOT predicting the absolute number of words in the comments)

The authors use a subset of the Yano & Smith blog dataset; data from 2 blogs, Matthew Yglesias (denoted MY) and Red State (denoted RS) were used.

The compared models were:

  • Naive Bayes (NB)
  • Regression (Reg): linear regression with elastic net regularization
  • Topic Poisson (T-Pois)
  • Topic Negative Binomial (T-NBin)
  • CommentLDA (C-LDA): refer to Yano et al NAACL 2009

Results:

Yano-icwsm-result.png

Word counts and comment counts were used to measure volume. Unless noted, the topic models have their parameters set to , , and

On the MY dataset, the Topic-Poisson model improves recall significantly over both Naive-Bayes and regression, on both # words and # comments, with a slight loss in precision compared to the other two models. The recall on the RS dataset show similar results, while Naive-Bayes performs the best in terms of precision. Overall, the topic model proposed in this paper does significantly outperform the baselines.

However, an advantage of using a topic model based approach is that we can view (coherent) topics within the blog posts. In the figure below, the authors also show the topics discovered in MY by the word-volume Topic-Poisson model. Topics are ranked by .

Yano-icwsm-my-topics.png

We see that the first topic pertains to race/gender issues, and that the second topic involves the presidential election. The horizontal line between the 9th and the 10th topic (with and , respectively) shows the boundary line between topics that have above/below average comment volume. While it sounds natural for a political blog such as MY to receive few comments on sports (), it is surprising to see that the Iraq War () is also deemed as a less comment-worthy topic.

Discussion

The paper proposes the task of predicting a blog post's comment volume. The authors approach this task by using topic models.

Similar to the result in Yano et al NAACL 2009, the topic model based approach does not significantly outperform the baseline, but does reveal patterns in which types of blog posts tend to get commented often. The authors claim that modeling topics can improve recall when predicting high volume posts.

Related Papers

The Topic-Poisson model is essentially a type of supervised or annotated LDA as defined in Blei and McAuliffe (2008) and Ramage et al. (2009).

Study Plan

This paper assumes prior knowledge of topic models. For the basics about topic models, refer to the Study Plans on Yano et al NAACL 2009.

  • Supervised topic models
    • David M. Blei and Jon D. McAuliffe, "Supervised topic models" Neural Information Processing Systems 21, 2007 pdf
    • Daniel Ramage, David Hall, Ramesh Nallapati and Christopher D. Manning, "Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora" Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 248–256, 2009 pdf