From Cohen Courses
Jump to navigationJump to search

BLOG06 is a TREC test collection which has been created and distributed by the University of Glasgow.

The dataset contains feeds, permalinks and homepages over an 11 weeks period.

  • 100,649 feeds
  • 3,215,171 permalinks
  • 324,880 homepages

17,969 spam blogs were added to the corpus in order to make it more realistic.

More information about the dataset can be found at RelatedPaper:Macdonald and Ounis 2006