Difference between revisions of "Pasca, WWW 2007"
PastStudents (talk | contribs) |
PastStudents (talk | contribs) |
||
Line 8: | Line 8: | ||
== Summary == | == Summary == | ||
− | The first step towards the acquisition of an extensive World Wide Web of facts which can be achieved by mining the from the Web documents. This step has been described in this (Organizing and searching the World Wide Web of facts - step one: the one-million fact extraction challenge) paper. In order to get the most of it from the step 1, the authors suggest to get the types of facts and class attributes of common interest from people in the form of Web search query logs. Therefore the author introduces step 2 which is mining the query logs in order to get more attributes for a target class by using 5 seed attributes or 10 seed instances. | + | The first step towards the acquisition of an extensive World Wide Web of facts which can be achieved by mining the from the Web documents. This step has been described in this (Organizing and searching the World Wide Web of facts - step one: the one-million fact extraction challenge) paper. In order to get the most of it from the step 1, the authors suggest to get the types of facts and class attributes of common interest from people in the form of Web search query logs. Therefore the author introduces step 2 which is mining the query logs in order to get more attributes for a target class by using 5 seed attributes or 10 seed instances and without any handcrafted extraction patterns or domain-specific knowledge. |
This [[Category::paper]] | This [[Category::paper]] |
Revision as of 00:58, 27 October 2010
Citation
Pasca M. 2007. Organizing and Searching the World Wide Web of Facts Step Two: Harnessing the Wisdom of the Crowds. In Proceedings of the 16th World Wide Web Conference (WWW-07). pages 101-110, Banff, Canada.
Online version
Summary
The first step towards the acquisition of an extensive World Wide Web of facts which can be achieved by mining the from the Web documents. This step has been described in this (Organizing and searching the World Wide Web of facts - step one: the one-million fact extraction challenge) paper. In order to get the most of it from the step 1, the authors suggest to get the types of facts and class attributes of common interest from people in the form of Web search query logs. Therefore the author introduces step 2 which is mining the query logs in order to get more attributes for a target class by using 5 seed attributes or 10 seed instances and without any handcrafted extraction patterns or domain-specific knowledge.
This paper
Mining queries vs documents
- Amount of text : On the average a query contains only 2 words, on the other hand documents may contain thousands. In theory more data means better results.
- Ambiguity : While web documents have clear contents, most of the web queries have ambiguity problems due to lack of grammatical structure, typos and misspellings. However, since the most search engines do not provide interactive search session, web users try to give clear and unambiguous queries to get their information fast.
- Capturing human knowledge : Since people form queries by using their common sense knowledge, queries are a good way of capturing this knowledge.
The author used a random sample of 50 million unique, fully-anonymized queries submitted to Google. A big fraction of these queries are 2-3 words which is makes seeing a class attribute together with class instance less likely.
40 target classes from different domains are used. For each class independently chosen 5 attributes are given. Several similarity functions have been tried. 3 labels are used while assessing the attributes. These are vital (1.0), okay (0.5) or wrong (0.0).
After the evaluations, it has been seen that the quality of attributes varies among classes but average precision over all target classes are high both in absolute value and relative to the attributes that are extracted with handcrafted rules from query logs.