TREC Blog06
From Cohen Courses
Jump to navigationJump to searchThis test collection was generated by NIST for the Blog track at the Text REtrieval Conference (TREC) in 2006. Query topics and relevance judgements for the year's track are also available from NIST separately.
The Blog06 test collection includes a crawl of feeds (XML), associated permalinks (HTML, retrieval units), and homepages during Dec 2005 through early 2006. The blog document set includes 100,649 feeds (38GB), 2.8 million permalinks (75GB), and 325,000 homepages (20GB).