Difference between revisions of "TREC Blog06"

From Cohen Courses
Jump to navigationJump to search
 
Line 2: Line 2:
  
 
The Blog06 test collection includes a crawl of feeds (XML), associated permalinks (HTML, retrieval units), and homepages during Dec 2005 through early 2006. The blog document set includes 100,649 feeds (38GB), 2.8 million permalinks (75GB), and 325,000 homepages (20GB).
 
The Blog06 test collection includes a crawl of feeds (XML), associated permalinks (HTML, retrieval units), and homepages during Dec 2005 through early 2006. The blog document set includes 100,649 feeds (38GB), 2.8 million permalinks (75GB), and 325,000 homepages (20GB).
 +
 +
It is used by:
 +
* [[RelatedPapers::Yang et al 2007 Fusion approach to finding opinions in blogosphere]]

Latest revision as of 01:16, 27 September 2012

This test collection was generated by NIST for the Blog track at the Text REtrieval Conference (TREC) in 2006. Query topics and relevance judgements for the year's track are also available from NIST separately.

The Blog06 test collection includes a crawl of feeds (XML), associated permalinks (HTML, retrieval units), and homepages during Dec 2005 through early 2006. The blog document set includes 100,649 feeds (38GB), 2.8 million permalinks (75GB), and 325,000 homepages (20GB).

It is used by: