AOL query log dataset
From Cohen Courses
Revision as of 14:19, 20 April 2010 by PastStudents (talk | contribs)
To build this dataset, first the most frequent 1050 queries were selected from the AOL query log. To make the dataset divergence enough another 1050 queries were also sampled randomly from the AOL query log according to their relative frequency. Most of these queries are in English. Finally for each query, the top 500 results returned by Google, Yahoo!, or MSN were retained as seeds.
Link: AOL query log dataset