Sood et al, ICWSM 2007

From Cohen Courses
Jump to navigationJump to search

Citation

Sood et Al. TagAssist: Automatic Tag Suggestion for Blog Posts. In Proceedings of the International Conference on Weblogs and Social Media (26-28 March 2007).

Online version

ICWSM

Summary

This paper paper presents an interesting approach for Tag recommendation. The basic ideas are:

  • Use existing and similar blog posts to recommend a set of tags to the user
  • Use Cosine_similarity to score the relevancy of the blog posts to the post for which tags need to be derived. A large tagged blog corpus was used for evaluation purposes.
  • Use Clustering to answer the problem of polysemy, synonymy and level variation.

Brief description of the method

The system retrieves relevant blog posts which are ranked using Cosine_similarity with the blog post in question. Corresponding tags for the blog posts are also retrieved. The tags are stemmed using Porter Stemmer and are reduced to their morphological root form. Each tag is clustered in a cluster indexed by the root form for the tag. To address the problems of polysemy, synonymy and level variation, the tags are re-clustered by taking into consideration the co-occurrence with the centroid of the cluster (the tag with the highest frequency). A small co-occurrence frequency implies that the tags imply different concepts in the blogosphere. This results in creation of a separate cluster. The tags (centroid of cluster) are scored with respect to heuristic measures such as frequency, presence in target post, occurrence in training corpus etc.

Experimental Result

Human judges were used to evaluate the appropriateness of tags for posts. The system could not out perform the manual tags for the blog posts which is not surprising. The original tags accuracy is also pretty low which suggests that humans also face problem while tagging the blogs which can be understood since there is a lack of incentive for accurately tagging a blog post for indexing or searching. An automated evaluation of 1000 blog posts against the baseline showed that the system excels over the baseline in precision metric but underperforms in recall metric. This experiment was carried out on the Technorati Dataset.

Related papers

One of the first systems which was used for tagging purposes was TagIt . An interesting related paper is Mishne, G. WWW 2006 which used collaborative filtering over the related blog posts to suggest a set of tags for a target post.