Identifying influential bloggers: WSDM 2008
Contents
Citation
Nitin Agarwal, Huan Liu, Lei Tang, Philip S. Yu, "Identifying the Influential Bloggers in a Community", Proceedings of the International Conference on Web Search and Web Data Mining (WSDM), 2008.
Online version
Summary
This paper aims at identifying most influential bloggers in a blogging community. The paper first proposes some metric for assessing how influential a blog post is. Then the authors perform some experiments on blogs from few blog sites and qualitatively evaluate their results. Dataset information is present on the page Influential Blogger WSDM 2008 Dataset.
What makes a Blog influential
Recognition: An influential blog post is recognized by many, which can be judged by the number of in-links (), i.e. the number of other posts referencing the particular post.
Activity Generation: A blog post that generates more activity is supposedly more influential. This is measured by the number of comments made on the blog post ().
Novelty: Novel ideas are supposed to be more influential [1]. A post that references more other posts (or has more out-links) is supposed to have lesser novel ideas. So novelty can be taken as negatively correlated with the number of out-links ().
Eloquence: More eloquent posts are more influential [1]. Authors use the length of the blog post () as a measure of eloquence.
Measuring Influence
The authors define a concept called InfluenceFlow. They conjecture that blog-influence flow can be thought of as a graph. For a post p having no. if in-links and no. of out-links , the InfluenceFlow is defined as:
Where w_{in} and w_{out} are the weights that can be adjusted for incoming and outgoing influences; p_m denotes the blog post that links to the post p, and p_n denotes the post to which the post p links; I(p_x) is the influence score of the post p_x. Note that unfortunately the paper doesn’t mention how I score is computed from the four parameters discussed above.
Authors further define the influence I for a post in terms of the InfluenceFlow, which looks weird, since they’ve already used I score in defining InfluenceFlow.
Where is the no. of comments made to the post p, and w_{com} is a regulating coefficient.
For the constant of proportionality, authors use a measure of the quality of the blog. However, the measure used by authors is quite naive and is actually a function of the length of the blog post . So
Authors further define for a blogger B as where is the influence score of a post made by blogger B. The higher the value of for any blogger, more influential they are considered.
Evaluation and Results
The authors evaluated their model on blog-posts made on digg.com. Digg allows its users to give votes for the blogs and a score so generated is reflective of how much the blog post is liked. Authors used this score to find 100 most liked posts and considered them as influential to evaluate their model against. Authors divided the user/blogger-base into two categories, active and inactive, and found influential and non-influential bloggers in both of them. Authors took top 20 influential posts for each of these categories of bloggers and saw how many of them made a hit in the top 100 posts as found based on Digg’s votes. The numbers of hits are captured in the table below.
References
[1] Ed Keller and Jon Berry. One American in ten tells the other nine how to vote, where to eat and, what to buy. They are The Influentials. The Free Press, 2003.