Sun, E., I. Rosenn, C. A Marlow, and T. M Lento. Gesundheit! Modeling Contagion through Facebook News Feed. Proc. ICWSM 9.
An online version of this paper is available here: 
The authors hypothesize that diffusion might work differently in social media networks compared to other kinds of networks, and in this paper, present an analysis of diffusion through Facebook with the use of Facebook fan pages. They find that Facebook diffusion chains are often extremely long, but not usually the result of a single chain-reaction event. Rather, these diffusion chains are typically started by a substantial number of users, and large clusters emerge when hundreds or even thousands of short diffusion chains merge together. They also find no evidence that a start node’s maximum diffusion chain length can be predicted with the user’s demographics or Facebook usage characteristics.
Most diffusion models start with an isolated event and explore the conditions under which this event will trigger a global cascade. The authors believe that this may not be the best method for modeling the spread of information through a social media network. Also, most models are developed without directly relevant empirical data, and the authors wish to address this by using a large quantity of real-world data from Facebook. The authors wish to investigate whether information necessarily spreads through these networks via long, branching chains of adoption, or instead spreads via large-scale collisions of shorter chains.
Dataset and Facebook diffusion mechanics
The authors concentrate on the News Feed propagation of Facebook fan pages, a particularly viral feature of Facebook. Pages are distinct, customized profiles designed for businesses, bands, celebrities, etc. to represent themselves on Facebook.
Diffusion of Pages occurs when 1) a user fans a Page; 2) this action is broadcast to their friends’ News Feeds; and 3) one or more of their friends sees the item and decides to become a fan as well.
The authors analyze Page data by creating trees that link actors and followers for each Page on Facebook. They measure diffusion via levels of a chain. Due to News Feed aggregation, users may see multiple friends perform a Page fanning action in a single News Feed story. For example, Charlie may see the following News Feed story: “Alice and Bob became a fan of Page XYZ.” In this case, Charlie’s node on the tree would have two parents. Furthermore, if Alice and Bob were on separate diffusion chains, the two chains would have now merged.
They infer links of associations based on News Feed impressions of friends’ Page fanning activity: if a user Ua saw that friend Ub became a fan of Page P within 24 hours prior of Ua becoming a fan, they record an edge from the follower Ub to the actor Ua. As they create larger trees, some users (i.e., fans in the middle of a chain) may become both actors and followers; some users may be actors but not followers (the chain-starters); and some users can be followers but not actors (the leaves of the chains).
In addition to general Pages data, the authors study the characteristics of chain length variation for different chain-starters by creating a chains dataset of 10 Pages. For each of these Pages, they gathered all of their associated fans and calculated the maximum chain length for each fan that started chains. They also collected various user-level features, such as age, gender, friend count, and various measures of Facebook activity.
Analysis: Large Cluster Phenomenon
The authors find that when they observe the above mentioned diffusion process over an extended period of time, a flurry of chains, all started by many people acting independently, often merges together into one huge group of friends and acquaintances. This merging occurs when one person fans a Page after seeing two or more friends (who are on separate chains) fan that same Page. In fact, for some popular Pages, more than 90% of the fans can be part of a single group of people who are all somehow connected to one another. Typically, these close-knit communities contains thousands of separate starting points—individuals who independently decide to fan a particular Page.
After looking at the distribution of “start nodes” and “follower nodes” in these clusters, the authors find no evidence to support the theory that just a few users are responsible for the popularity of Pages. Instead, across all Pages of meaningful size (>1000 fans), an average of 14.8% (SD 7.9%) of the fans in each Page’s biggest cluster were start nodes. The authors find that each of these fans arrived independently (presumably by searching for the Page via Facebook Search or from an advertisement) and started their own chains, which eventually merged together as the rest of the fan base took shape.
Analysis: Prediction of Maximum Chain Length
For each of the 179,010 start nodes in their chains dataset, the authors calculate all the chains of diffusion and find each user’s maximum-length chain. This value, max_chain, is their response variable. They then perform zero-inflation Negative Binomial Regression, which is appropriate when variance >> mean. The predictor variables for their regression model are: • log age • gender • log Facebook_age (number of days the user has been a member of Facebook) • log activity_count (messages sent + photos uploaded + Facebook wall posts sent) • log friend_count (number of Facebook friends) • log feed_exposure (number of friends who saw the News Feed story broadcasting the user’s fanning action) • log popularity (number of friends that “care about” the start node high enough that the News Feed algorithm considers broadcasting the start node’s Page fanning story)
The authors find that neither demographic characteristics nor number of Facebook friends have realistically meaningful coefficients, i.e. neither seems to play an important role in the prediction of maximum diffusion chain length.
The authors find that many nodes are chain initiators, which differs from start conditions for most theoretical diffusion networks.
Facebook chains of Page fanning tend to be longer lasting, and involve more people, but a particular initiator’s demographic properties and site usage characteristics do not appear to have any meaningful impact on that node’s maximum diffusion chain length.