Network of Blogs

From Cohen Courses
Jump to navigationJump to search

This is one of the datasets discussed in Social Media Analysis 10-802 in Spring 2010.

  • # Blogs = 45000
  • # Posts = 10500000
  • # Links = 16200000

This data set was used in the paper Cost Effective Outbreak Detection in Networks.

This dataset was generated by sampling from a much larger set of 2.5 million blogs. They only considered blogs that received at least 3 in-links in the first 6 months of 2006 and then took all their posts for the full year. Posts have rich metadata, including time stamps, which allows extraction of information cascades.