Difference between revisions of "Yano et al NAACL 2009"
Line 28: | Line 28: | ||
== Experimental Result == | == Experimental Result == | ||
− | Task: given a training dataset consisting of a collection of blog posts and their commenters and comments, and | + | '''Task''': given a training dataset consisting of a collection of blog posts and their commenters and comments, and an unseen test dataset from a later time period, |
predict who is going to comment on a new blog post from the test set. | predict who is going to comment on a new blog post from the test set. | ||
− | + | The authors have released the [[UsesDataset::Yano & Smith blog dataset|Yano & Smith blog dataset]], which was used for this evaluation. | |
The compared models were: | The compared models were: | ||
− | + | * Baseline: post-independent prediction that ranks users by their comment frequency | |
+ | * Naive Bayes: with word counts in the post's main entry as features | ||
+ | * LinkLDA: 3 variations (verbosity, response, comments) | ||
+ | * CommentLDA: 3 variations (verbosity, response, comments) | ||
== Discussion == | == Discussion == |
Revision as of 08:55, 26 September 2012
Contents
Citation
Tae Yano, William Cohen, and Noah A. Smith. Predicting Response to Political Blog Posts with Topic Models. In Proc of NAACL 2009.
Online Version
Predicting Response to Political Blog Posts with Topic Models.
Summary
This Paper describes a topic model based approach in modeling the generation of blog text (posts and comments).
Brief description of the method
This paper expands upon LinkLDA, presented in Erosheva et al. (2004).
Here, is a distribution over topics, is a multinomial distribution over post words, and is a multinomial distribution over (comment) users. and are the words counts in the post and all of its comments, respectively.
Although LinkLDA can model which users are likely to respond to a post, it does not model the comment text they will write.
The authors expand on this by proposing CommentLDA, as shown below.
In CommentLDA, note that the comment text is modeled by the distribution over comment words given topics, .
The authors provide three different variations on how to count the comments.
Experimental Result
Task: given a training dataset consisting of a collection of blog posts and their commenters and comments, and an unseen test dataset from a later time period, predict who is going to comment on a new blog post from the test set.
The authors have released the Yano & Smith blog dataset, which was used for this evaluation.
The compared models were:
- Baseline: post-independent prediction that ranks users by their comment frequency
- Naive Bayes: with word counts in the post's main entry as features
- LinkLDA: 3 variations (verbosity, response, comments)
- CommentLDA: 3 variations (verbosity, response, comments)