Difference between revisions of "Catching and Forecasting Popular Videos on Youtube"

From Cohen Courses
Jump to navigationJump to search
Line 11: Line 11:
  
 
==Team==
 
==Team==
 +
[[User:Lujiang|Lu Jiang]]
  
 +
Available
  
 
==Datasets==
 
==Datasets==
In Google's paper they adopt their internal datset which is unavailable for research study outside of Google. There exists a public dataset[http://www.ida.liu.se/~nikca/papers/kdd12.html] consistsing of 1,761 videos metadata. However, several limitations renders it inappropriate for this study.
+
In Google's paper(Tom Broxton,2011), they adopt their internal datset which is unavailable for research study outside of Google. There exists a public dataset[http://www.ida.liu.se/~nikca/papers/kdd12.html] consisting of 1,761 videos metadata. However, several limitations renders it inappropriate for this study.
  
 
* Most videos are NOT popular videos(Almost no viral videos).
 
* Most videos are NOT popular videos(Almost no viral videos).
Line 42: Line 44:
  
 
== Related Work ==
 
== Related Work ==
 
  
 
[http://malt.ml.cmu.edu/mw/index.php/Tom_Broxton_el_al.,_Catching_a_viral_video,_J_Intell_Inf_Syst_2011]Tom Broxton and Yannet Interian and Jon Vaver and Mirjam Wattenhofer: Catching a viral video. Journal of Intelligent Information Systems 2011: 1-19.
 
[http://malt.ml.cmu.edu/mw/index.php/Tom_Broxton_el_al.,_Catching_a_viral_video,_J_Intell_Inf_Syst_2011]Tom Broxton and Yannet Interian and Jon Vaver and Mirjam Wattenhofer: Catching a viral video. Journal of Intelligent Information Systems 2011: 1-19.

Revision as of 15:43, 6 October 2012

Teammate WANTED!

Overview

Have you ever considered why some videos could attract millions of views within just a few days?

Recently, research community such as Google Research begin to study the characteristics of the popular videos hoping the discovery would benefit the marketing and advertisement(YouTube Analysis). This project intents to model and predict the view pattern of popular videos with the emphasis on viral videos[1]. Specially it involves the following tasks:

  1. Dataset collection.
  2. Modeling the view growth distribution based on BA model.
  3. Popularity prediction by Naive method such as SVM or Regression.

Team

Lu Jiang

Available

Datasets

In Google's paper(Tom Broxton,2011), they adopt their internal datset which is unavailable for research study outside of Google. There exists a public dataset[2] consisting of 1,761 videos metadata. However, several limitations renders it inappropriate for this study.

  • Most videos are NOT popular videos(Almost no viral videos).
  • Information is incomplete. For example the comments and the information about the uploaded user are missing.
  • Clone videos, which refers to the copy of videos uploaded by different users, are manually labeled. Therefore the dataset may not be easily scaled.

A dataset of popular YouTube videos (initially 2000~3000 videos) will be collected in this project. For each video the following information will be included:

  • Metadata
  • Comments
  • Video and thumbnail
  • Public statistics

The clone videos will be identified by an automatically approach named Near Duplicate Videos Detection [3].

Baseline Method

The proposed method will be compared against the other

Advantages

  • Knowledge and code for crawling the metadata and video on YouTube.
  • Knowledge and code for video and image content analysis, such as Near Duplicate Detection and Semantic Object Detection.

Challenges

  • First public dataset on YouTube Popular videos of comprehensive information.
  • Video and Thumbnail content analysis (Semantic Object Detection).
  • Growth model for the view pattern of viral videos.


Related Work

[4]Tom Broxton and Yannet Interian and Jon Vaver and Mirjam Wattenhofer: Catching a viral video. Journal of Intelligent Information Systems 2011: 1-19.

[5]Youmna Borghol, Siddharth Mitra, Sebastien Ardon, Niklas Carlsson, Derek L. Eager, Anirban Mahanti: Characterizing and modelling popularity of user-generated videos. Perform. Eval. 68(11): 1037-1055 (2011)

[6]Y. Borghol, S. Ardon, N. Carlsson, D. Eager, and A. Mahanti, Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), Beijing, China, August 2012, to appear.

[7]Gábor Szabó, Bernardo A. Huberman: Predicting the popularity of online content. Commun. ACM 53(8): 80-88 (2010) </references>