Yano et al ICWSM 2010. What’s Worthy of Comment? Content and Comment Volume in Political Blogs

Citation

Tae Yano and Noah A. Smith. What’s Worthy of Comment? Content and Comment Volume in Political Blogs. In Proc of ICWSM 2010.

Online Version

What’s Worthy of Comment? Content and Comment Volume in Political Blogs.

Summary

This Paper describes a topic model based approach in modeling the relationship between the text content of a political blog post and the comment volume (i.e. the total amount of response) that a post will receive. In essence, given a blog post (text content), the paper tries to predict how much comment volume the post will have, using topic models.

Brief description of the method

The author's propose a generative model, called the Topic-Poisson model, which proceeds as follows. The number of topic $K$ is fixed in advance.

Apart from step 2(c), the model is identical to a (smoothed) LDA. $v_{d}$ is chosen from a mixture of distributions, where $m_{d,k}$ are the mixture weights and $p$ are Poisson mixture components.

The authors also present a variation of this model called the Topic Negative Binomial model, where the mixture of Poissons is replaced by a mixture of negative binomials.

Experimental Result

Task: Predict whether a blog post will have higher volume than the average seen in training data (Note that they are NOT predicting the absolute number of words in the comments)

The authors use a subset of the Yano & Smith blog dataset; data from 2 blogs, Matthew Yglesias (denoted MY) and Red State (denoted RS) were used.

The compared models were:

Naive Bayes (NB)
Regression (Reg): linear regression with elastic net regularization
Topic Poisson (T-Pois)
Topic Negative Binomial (T-NBin)
CommentLDA (C-LDA): refer to Yano et al NAACL 2009

Results:

Word counts and comment counts were used to measure volume. Unless noted, the topic models have their parameters set to $K=15$ , $\alpha =0.1$ , and $\beta =0.1$

The Topic-Poisson model improves recall substantially over Naive-Bayes, on both # words and # comments, with a slight loss in precision. Its precision lags behind the regression model.

The authors also show the topics discovered in MY by the word-volume Topic-Poisson model. Topics are ranked by $\lambda _{k}$ .

Discussion

Modeling topics can improve recall when predicting high volume posts.

Related Papers

The Topic-Poisson model is essentially a type of supervised or annotated LDA as defined in Blei and McAuliffe (2008) and Ramage et al. (2009).

Yano et al ICWSM 2010. What’s Worthy of Comment? Content and Comment Volume in Political Blogs

Contents

Citation

Online Version

Summary

Brief description of the method

Experimental Result

Discussion

Related Papers

Study Plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools