Heymann, WSDM 2008

From Cohen Courses
Jump to navigationJump to search

Citation

Paul Heymann, Georgia Koutrika, Hector Garcia-Molina: Can social bookmarking improve web search? WSDM 2008: 195-206

 author    = {Paul Heymann and
              Georgia Koutrika and
              Hector Garcia-Molina},
 title     = {Can social bookmarking improve web search?},
 booktitle = {WSDM},
 year      = {2008},
 pages     = {195-206},
 ee        = {http://doi.acm.org/10.1145/1341531.1341558},
 crossref  = {DBLP:conf/wsdm/2008},

Online version

Paper: Can Social Bookmarking Improve Web Search?

Video: First ACM International Conference on Web Search and Data Mining - WSDM 2008

Abstract from the paper

Social bookmarking is a recent phenomenon which has the potential to give us a great deal of data about pages on the web. One major question is whether that data can be used to augment systems like web search. To answer this question, over the past year we have gathered what we believe to be the largest dataset from a social bookmarking site yet analyzed by academic researchers. Our dataset represents about forty million bookmarks from the social bookmarking site del.icio.us. We contribute a characterization of posts to del.icio.us: how many bookmarks exist (about 115 million), how fast is it growing, and how active are the URLs being posted about (quite active). We also contribute a characterization of tags used by bookmarkers. We found that certain tags tend to gravitate towards certain domains, and vice versa. We also found that tags occur in over 50 percent of the pages that they annotate, and in only 20 percent of cases do they not occur in the page text, backlink page text, or forward link page text of the pages they annotate. We conclude that social bookmarking can provide search data not currently provided by other sources, though it may currently lack the size and distribution of tags necessary to make a significant impact.

Summary

Synopsis

In this paper Heymann et al. have studied the idea of adding user annotations to the process of search engines to improve search quality. They present eleven research results about social bookmarking system [del.icio.us] and show how the crawled data from del.icio.us website can help to augment web search. They have studied both positive and negative impact of social bookmarking on web search. Below is the summary of the their results:

- Pages posted to del.icio.us are often recently modified.

- Approximately 25% of URLs posted by users are new, unindexed pages.

- Roughly 9% of results for search queries are URLs present in del.icio.us.

- While some users are more prolific than others, the top 10% of users only account for 56% of posts.

- 30-40% of URLs and approximately one in eight domains posted were not previously in del.icio.us.

- Popular query terms and tags overlap significantly.

- Approximately 120,000 URLs are posted to del.icio.us each day.

- There are roughly 115 million public posts, coinciding with about 30-50 million unique URLs.

- Tags are present in the pagetext of 50% of the pages they annotate and in the titles of 16% of the pages they annotate.

- Domains are often highly correlated with particular tags and vice versa.


Datasets

To analyze their method they have used the following datasets:

- Crawl of [del.icio.us], Heymann Dataset contains forty million bookmarks from the social bookmarking site del.icio.us.

- Philipp Keller's dataset Dataset is obtainedf from:del.icio.us posting dataset

- AOL query log dataset

Related Works and Papers

The first work which suggested the use of anchortext and link structure to improve web search was by Eiron and McCurley, 2003. Both Bao et al.,2007 and Yanbe et al., 2007 proposed method to use tagging data in search engines but neither looked at del.icio.us and its applicability for search engines. They both also have used relatively small databases. Independently of web search, researcher have also studied tagging techniques deeply which is out of scope of this paper.

Conclusion

They conclude that social bookmarking websites can be used to provide search data for web search which are not provided by other sources. However currently it lack the size and distribution of tags.

References

1- N. Eiron and K. S. McCurley. Analysis of Anchor Text for Web Search. In SIGIR ’03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 459–460, New York, NY, USA, 2003. ACM.

2- S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Optimizing Web Search Using Social Annotations. In WWW ’07: Proceedings of the 16th International Conference on World Wide Web, pages 501–510, New York, NY, USA, 2007. ACM. Collaborative Tagging Systems. Journal of Information Science, 32(2):198–208, April 2006.

3- Y. Yanbe, A. Jatowt, S. Nakamura, and K. Tanaka. Can Social Bookmarking Enhance Search in the Web? In JCDL ’07: Proceedings of the 2007 Conference on Digital Libraries, pages 107–116, New York, NY, USA, 2007. ACM.