Comparison: A Latent Variable Model for Geographic Lexical Variation and A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Papers

Problem

Hassan et al. were trying to target the problem of ranking documents in a set based on their similarity to identify the representative blogs in a given set usually based on different topics, similar to blog summarization.

Arguello et al. were trying to target the problem of blog retrieval - retrieving ranked list of blogs relevant to the given user query.

Basically these two papers are trying to achieve different goals. Hassan et al. proposed methods of ranking blogs within a given topic collection. This ranking of blogs based on their importance in a topic collection can be useful for blog search tasks. Arguello et al. experimented with various models to try and improve the blog search results.

Big Idea

The two papers differ in their respective central ideas as they both try to solve different problems as mentioned in Problem section above. They do use a common data set to evaluate their experiments, but their results can't be compared due to the difference in the problem they are addressing.

Method

Hassan et al. have used BlogRank algorithm to rank blogs according to their popularity which considers lexical similarity between two blogs to identify graphical links between nodes representing the two blogs. And then based on the iterative algorithm like a random walk of this graph, it determines the rank for each blog. It also enhances diversity by penalizing blogs which are similar to a higher ranked blog.

Arguello et al. have used different blog representation models and query expansion techniques to enhance the blog retrieval results. They have tried representation models considering entire blog as one large document or treating each blog post as a small document within a collection. For query expansion they experimented with the traditional pseudo-relevance feedback model and another method where they extended the query using ranked anchor text from Wikipedia corpus related to the base query.

Dataset Used

Hassan et al. used the TREC BLOG06 and UCLA Blogocenter datasets for experiments, whereas Arguello et al. used only the TREC BLOG06 dataset for its experiments.

Other Discussions

Both the papers show significant improvement in results from baseline with their proposed methods. Both these papers deal with problems which together are essential in better understanding of the blogosphere and will be helpful in blog retrieval and summarization.

Comparison: A Latent Variable Model for Geographic Lexical Variation and A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Contents

Papers

Problem

Big Idea

Method

Dataset Used

Other Discussions

Other Questions

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools