Difference between revisions of "Roja Bandari et. al. ICWSM 2012"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Neighborhood Formation and Anomaly Detection in Bipartite Graphs, Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, Christos Faloutsos, ICDM 2005')
 
Line 1: Line 1:
 
== Citation ==
 
== Citation ==
  
Neighborhood Formation and Anomaly Detection in Bipartite Graphs,
+
R Bandari, S Asur, BA Huberman
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, Christos Faloutsos, ICDM 2005
+
The Pulse of News in Social Media: Forecasting Popularity, ICWSM 2012
 +
 
 +
 
 +
== Summary ==
 +
 
 +
In this paper, the author address the following problem: predict the popularity of news prior to their release. They extract features from article based on its content, using two methods to predict their popularity: regression and classification, and evaluate with the actual popularity from social media, like Twitter.
 +
 
 +
== Datasets ==
 +
 
 +
They collected all news article, from August 8th to 16th using API of a news aggregator called FeedZilla. Each article include a title, short summary, url, and a timestamp, and a category. The total number of data after cleaning is over 42,000.
 +
 
 +
They then using a service called Topsy, to collect the times being posted and retweeted on Twiiter for each new article.
 +
 
 +
== Features ==

Revision as of 21:55, 26 September 2012

Citation

R Bandari, S Asur, BA Huberman The Pulse of News in Social Media: Forecasting Popularity, ICWSM 2012


Summary

In this paper, the author address the following problem: predict the popularity of news prior to their release. They extract features from article based on its content, using two methods to predict their popularity: regression and classification, and evaluate with the actual popularity from social media, like Twitter.

Datasets

They collected all news article, from August 8th to 16th using API of a news aggregator called FeedZilla. Each article include a title, short summary, url, and a timestamp, and a category. The total number of data after cleaning is over 42,000.

They then using a service called Topsy, to collect the times being posted and retweeted on Twiiter for each new article.

Features