Difference between revisions of "Comparison mixed membership topic poisson"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
  
 
== Method ==
 
== Method ==
 
 
  
 
== Datasets ==
 
== Datasets ==

Revision as of 02:47, 6 November 2012

Papers

Elena Erosheva, Stephen Fienberg, John Lafferty. Mixed Membership Models of Scientific Publications. PNAS (101) 2004.

Tae Yano and Noah A. Smith. What’s Worthy of Comment? Content and Comment Volume in Political Blogs. Proc of ICWSM 2010.

Problems

The two papers deal with different types of problems. The goal of the PNAS paper is to model scientific publications using the mixed membership models. It's emphasis is to add references as an additional source, and model article contents and references simultaneously. However, the ICWSM paper focuses on predicting the volume of comments which are received by a blog post.

Method

Datasets

The PNAS paper uses the PNSA archive of Biological Science articles between 1997 and 2001 as the dataset. This dataset totally contains 11,981 articles and 77,115 unique references.

The ICWSM paper builds its dataset by collecting blog posts from two websites: Matthew Yglesias and RedState. Stops words are removed for preprocessing of texts. The mean volume is approximately 1424 words (35 comments) for Matthew Yglesias and 819 words (29 comments) for RedState.


Method