Roja Bandari et. al. ICWSM 2012
Contents
Citation
R Bandari, S Asur, BA Huberman The Pulse of News in Social Media: Forecasting Popularity, ICWSM 2012
Summary
In this paper, the author address the following problem: predict the popularity of news prior to their release. They extract features from article based on its content, using two methods to predict their popularity: regression and classification, and evaluate with the actual popularity from social media, like Twitter.
Datasets
They collected all news article, from August 8th to 16th using API of a news aggregator called FeedZilla. Each article include a title, short summary, url, and a timestamp, and a category. The total number of data after cleaning is over 42,000.
They then using a service called Topsy, to collect the times being posted and retweeted on Twiiter for each new article.
Features
- The category of the article, like technology and entertainment. They define a measure called t-density, which means the average tweets of link in a particular category, to define the popularity of such category. - Subjectivity of the article. They use an existing NLP toolkit to assign a binary score to an article, whether the article is subjective or objective. - Name Entity. They use t-density to measure the popularity of all the name entities. Then for a given article, they calculate score, according to name entity extracted from the article. - The Source of a news. The 't-density is also used in measuring the popularity of a news source, like weblogs.