Y. Borghol et al. Performance Evaluation 68 2011
Citation
Youmna Borghol, Siddharth Mitra, Sebastien Ardon, Niklas Carlsson, Derek L. Eager, Anirban Mahanti: Characterizing and modelling popularity of user-generated videos. Perform. Eval. 68(11): 1037-1055 (2011)
Online version
Summary
This is a paper proposes a model for the dynamics of YouTube videos popularity based on the data set they collected over 8 months. Specially it claims the peak view of an individual video follows a certain distribution called Time-to-peak distribution. Based on it they divide the view into three phases namely before, at or after peak. Finally a three-phase evolution model is brought forward to explain the dynamics of video views for newly-uploaded videos.
Data set
1.1 million videos metadata collected at weekly level over 8 months in the following two ways: 1) sampling from the recently-uploaded videos (29,791 videos collected) and 2) sampling using keyword search (1,135,253 videos collected).
There are potential two drawbacks in their collecting methods. First collecting views in a weekly manner would discard some important information about the popular videos such as viral videos. According to Google's paper [2], many videos reach their peek-view within a week and receive 25% social view in their first uploaded day.
Secondly, about 97% videos metadata were collected by keyword search which is significantly biased towards the current #views and the age of the response videos. The ranking algorithm often favors the newly updated and popular videos given the similar keyword relevance. In other words, the data set may not well represent a random proportion of Youtube videos (In contrast the data set used in [3] derived from random sampling)
Therefore the conclusion drawn from this data set needs a careful examination and further validation.
Method
Through the empirical analysis, the authors claim that the time-to-peak distribution approximately follows an exponential distribution and they found that a large fraction of videos peak within the first six weeks.
As they mentioned in the paper, the exogenous and endogenous factors both influence the popularity. However, they totally ignore the exogenous events during their analysis. For example the following is view plot I generated for a popular video "Dog Fight", the peak view of the video is clearly results from some exogenous events (Probably Event D which is "First embedded on jaramsie.pl"). Since the events are the major reasons accounting for a popular video, analysis without them would be of little use.
Then they propose a three phrases namely before, at and after the peak which disagrees the observation reported in [4] where they found some videos's view patterns are bi-modal, see the above figure. In addition they made an assumption that weekly viewing rate within each phase are invariant, based on which they propose the following model. Suppose
N | the total number of newly uploaded videos |
d | the total number of weeks |
the number of videos at week | |
time-to-peak distribution | |
the view distribution for videos in the before-peak phase | |
the view distribution for videos in the at-peak phase | |
the view distribution for videos in the after-peak phase |
Step 0 For each week i = {1,..d};
Step 1 Sampling N values from and counts the number of videos in the at-peak phrase (); update
,
,
such that
Step 2 Sampling from respectively.