BLOG06 is a TREC test collection which has been created and distributed by the University of Glasgow.
The dataset contains feeds, permalinks and homepages over an 11 weeks period.
- 100,649 feeds
- 3,215,171 permalinks
- 324,880 homepages
17,969 spam blogs were added to the corpus in order to make it more realistic.
More information about the dataset can be found at RelatedPaper:Macdonald and Ounis 2006