The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity

From Cohen Courses
Jump to navigationJump to search

Online Version

An electronic version of this paper can be downloaded here: [1]

Summary

This paper develops and applies a methodology for assessing the impacts of various content-agnostic factors on video popularity (e.g. total view counts, uploader followers, video age, etc.). To evaluate the relative influence of different factors, three statistical tools are used: 1) Principal Component Analysis (PCA) for grouping of variables responsible for the popularity variation, 2) Correlation and collinearity analysis for identifying interrelated variables, and 3) Multi-linear regression with variable selection for identifying most informative variables. The dataset they used contains 48 clone sets of Youtube videos with three types of information: video statistics, historical view count, and influential events. Some of the most important findings from their analysis on Youtube videos include: 1) Inaccurate conclusions may be reached when not controlling for video content; 2) Total view count is the most important explanatory variable except for very young videos; 3) Content-agnostic factors can also help explain the popularity dynamics to some extent; and 4) There's strong advantage in video popularity for first movers.

Results

1. By applying Principal Component Analysis (PCA) on each of the 48 collected clone sets, the two primary components roughly correspond to video popularity and uploader popularity metrics. And for clone sets with big variation in video age, the third major component is video characteristics such as video age and video quality.

2. Multivariate regression analysis with best subset search technique is applied to individual clone sets. The obtained results reveal that the total view count is the most important explanatory variable with the video age being the second.

3. Regression analysis results of the content-aware extended model are better than that using the regular individual clone set models.

4. Without controlling for video content, one might draw inaccurate conclusions from regression analysis.

5. Popularity evolution is scale free and strongly controlled by rich-get-richer behavior.

6. First movers have clear advantage in popularizing uploaded videos.

7. For very young (newly uploaded) videos, the uploader's social networks has more impact than the total view count.

Related Papers

Study Plan

  • Multi-linear Regression
  • Correlation and collinearity analysis techniques.
  • Principal Component Analysis
  • Best variable subset selection