Pasca, WWW 2007

From Cohen Courses
Revision as of 01:44, 27 October 2010 by PastStudents (talk | contribs)
Jump to navigationJump to search

Citation

Pasca M. 2007. Organizing and Searching the World Wide Web of Facts Step Two: Harnessing the Wisdom of the Crowds. In Proceedings of the 16th World Wide Web Conference (WWW-07). pages 101-110, Banff, Canada.

Online version

WWW-07

Summary

The first step towards the acquisition of an extensive World Wide Web of facts which can be achieved by mining the from the Web documents. This step has been described in this paper. In order to get the most of it from the step 1, the authors suggest to get the types of facts and class attributes of common interest from people in the form of Web search query logs. Therefore the author introduces step 2 which is mining the query logs in order to get more attributes for a target class by using 5 seed attributes or 10 seed instances.

This paper

Mining queries vs documents

  • Amount of text : On the average a query contains only 2 words, on the other hand documents may contain thousands. In theory more data means better results.
  • Ambiguity : While web documents have clear contents, most of the web queries have ambiguity problems due to lack of grammatical structure, typos and misspellings. However, since the most search engines do not provide interactive search session, web users try to give clear and unambiguous queries to get their information fast.
  • Capturing human knowledge : Since people form queries by using their common sense knowledge, queries are a good way of capturing this knowledge.