Difference between revisions of "Identifying influential bloggers: WSDM 2008"

Revision as of 16:12, 31 March 2011

Citation

Nitin Agarwal, Huan Liu, Lei Tang, Philip S. Yu, "Identifying the Influential Bloggers in a Community", Proceedings of the International Conference on Web Search and Web Data Mining (WSDM), 2008.

Online version

Available at Citesteer

Summary

This paper aims at identifying most influential bloggers in a blogging community. The paper first proposes some metric for assessing how influential a blog post is. Then the authors perform some experiments on blogs from few blog sites and qualitatively evaluate their results.

What makes a Blog influential

Recognition: An influential blog post is recognized by many, which can be judged by the number of in-links ( $\iota$ ), i.e. the number of other posts referencing the particular post.
Activity Generation: A blog post that generates more activity is supposedly more influential. This is measured by the number of comments made on the blog post ( $\gamma$ ).
Novelty: Novel ideas are supposed to be more influential [1]. A post that references more other posts (or has more out-links) is supposed to have lesser novel ideas. So novelty can be taken as negatively correlated with the number of out-links ( $\theta$ ).
Eloquence: More eloquent posts are more influential [1]. Authors use the length of the blog post ( $\lambda$ ) as a measure of eloquence.

Measuring Influence

The authors define a concept called InfluenceFlow. They conjecture that blog-influence flow can be thought of as a graph. For a post p having no. if in-links $\iota$ and no. of out-links $\theta$ , the InfluenceFlow is defined as:
$InfluenceFlow(p)=w_{in}\Sigma (m=1\ to\ \iota )I(p_{m})-w_{out}\Sigma (n=1\ to\ \theta )I(p_{n})$
Where w_{in} and w_{out} are the weights that can be adjusted for incoming and outgoing influences; p_m denotes the blog post that links to the post p, and p_n denotes the post to which the post p links; I(p_x) is the influence score of the post p_x. Note that unfortunately the paper doesn’t mention how I score is computed from the four parameters discussed above. Authors further define the influence I for a post in terms of the InfluenceFlow, which looks weird, since they’ve already used I score in defining InfluenceFlow.
$I(p)\propto w_{com}\gamma _{p}+InfluenceFlow(p)$
Where γp is the no. of comments made to the post p, and wcom is a regulating coefficient. For the constant of proportionality, authors use a measure of the quality of the blog. However, the measure used by authors is quite naive and is actually a function of the length of the blog post w(λ). So I(p) = w(λ)x (wcomγp + InfluenceFlow(p)) Authors further define iIndex(B) for a blogger B as max(I(pi)) where I(pi) is the influence score of a post made by blogger B. The higher the value of iIndex for any blogger, more influential they are considered.

Results

Two metrics were used: R-Precision and NDCG. NDCG is described in [2].

References

[1] D. Shen, Q. Yang, J.-T. Sun, and Z. Chen. Thread detection in dynamic text message streams. In Proc. of SIGIR ’06, pages 35–42, Seattle, Washington, 2006.
[2] K. Jrvelin and J. Keklinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR ’00, pages 41–48, Athens, Greece, 2000.

@@ Line 20: / Line 20: @@
 The authors define a concept called InfluenceFlow. They conjecture that blog-influence flow can be thought of as a graph. For a post p having no. if in-links <math>\iota</math> and no. of out-links <math>\theta</math>, the InfluenceFlow is defined as:<br>
 <math>InfluenceFlow(p)= w_{in} \Sigma (m=1\ to\ \iota) I(p_m)-w_{out} \Sigma (n=1\ to\ \theta) I(p_n)</math><br>
-Where win and wout are the weights that can be adjusted for incoming and outgoing influences; pm denotes the blog post that links to the post p, and pn denotes the post to which the post p links; I(px) is the influence score of the post px. Note that unfortunately the paper doesn’t mention how I score is computed from the four parameters discussed above.
+Where w_{in} and w_{out} are the weights that can be adjusted for incoming and outgoing influences; p_m denotes the blog post that links to the post p, and p_n denotes the post to which the post p links; I(p_x) is the influence score of the post p_x. Note that unfortunately the paper doesn’t mention how I score is computed from the four parameters discussed above.
-Authors further define the influence I for a post in terms of the InfluenceFlow, which looks weird, since they’ve already used I score in defining InfluenceFlow.
+Authors further define the influence I for a post in terms of the InfluenceFlow, which looks weird, since they’ve already used I score in defining InfluenceFlow. <br>
-I(p) ∝ wcomγp + InfluenceFlow(p)
+<math>I(p) \propto w_{com}\gamma_p + InfluenceFlow(p)</math><br>
 Where γp is the no. of comments made to the post p, and wcom is a regulating coefficient.
 For the constant of proportionality, authors use a measure of the quality of the blog. However, the measure used by authors is quite naive and is actually a function of the length of the blog post w(λ). So

Difference between revisions of "Identifying influential bloggers: WSDM 2008"

Revision as of 16:12, 31 March 2011

Contents

Citation

Online version

Summary

What makes a Blog influential

Measuring Influence

Results

References

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools