Difference between revisions of "Zhang et all, WWW 2007"

From Cohen Courses
Jump to navigationJump to search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Citation ==
 
== Citation ==
 
+
Jun Zhang, Mark S. Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web (WWW '07). ACM, New York, NY, USA, 221-230.
 
 
  
 
== Online version ==
 
== Online version ==
  
[http://www-personal.umich.edu/~hassanam/my_publications/icwsm09.pdf ICWSM09]
+
[http://portal.acm.org/citation.cfm?id=1242603 ACM]
  
 
== Summary ==
 
== Summary ==
  
 
The aim of this [[Category::paper]] is to identify users with high expertise within online expertise-sharing communities. This [[AddressesProblem::expertise finding]] system uses graph-based algorithms on social networks within the community.  
 
The aim of this [[Category::paper]] is to identify users with high expertise within online expertise-sharing communities. This [[AddressesProblem::expertise finding]] system uses graph-based algorithms on social networks within the community.  
 
They treat expertise as a relative concept.
 
 
network based algorithms such as PageRank, HITS
 
  
 
They created a post-reply network in which each user is represented as a node and a directed edge is created from each user who started the post to other users who replied to it. The prestige measure of this network is highly correlated with a user's expertise due to the way the network is constructed. Therefore this network is called ''community expertise network (CEN)''.
 
They created a post-reply network in which each user is represented as a node and a directed edge is created from each user who started the post to other users who replied to it. The prestige measure of this network is highly correlated with a user's expertise due to the way the network is constructed. Therefore this network is called ''community expertise network (CEN)''.
Line 19: Line 14:
 
'''Network Characteristics'''
 
'''Network Characteristics'''
  
The authors experimented on the Java Forum which is a large online help-seeking community. Before testing the algorithms they did several analysis to characterize the network. Below are the performed analysis and their results
+
The authors experimented on the [[UsesDataset::Java Forum]] which is a large online help-seeking community. Before testing the algorithms they did several analysis to characterize the network. Below are the performed analysis and their results
 
* The Bow tie structure analysis : More than half of the users only asks questions. 13% only answers and 12% both answers and asks.
 
* The Bow tie structure analysis : More than half of the users only asks questions. 13% only answers and 12% both answers and asks.
 
* Degree distribution analysis : The majority of users answers only a few questions but few active users answers a lot of questions.  
 
* Degree distribution analysis : The majority of users answers only a few questions but few active users answers a lot of questions.  
Line 29: Line 24:
  
 
* Simple statistical measures : Just counting the number of replies or counting the number of users helped to calculate the score of expertise of a user.  
 
* Simple statistical measures : Just counting the number of replies or counting the number of users helped to calculate the score of expertise of a user.  
 +
* Z-score : A measure that combines one's asking and replying patterns.
 +
* Expertise Rank Algorithm : [[UsesMethod::PageRank]] like algorithm which uses not only count of users helped but also whom one helped.
 +
* HITS Authority : Similar to [[UsesMethod::HITS]] algorithm where good hub is a user who is helped by many experts and good authority is a user who helps many good hubs.
  
 +
In experiments the authors used Spearman's Rho and Kendall's Tau measures to understand the correlations between these ranking algorithms and the human-assigned ratings. It has been observed that they are highly correlated which means that structural information can be used to identify experts in online communities.
  
the important and influential blogs with recurring interest in a specific topic. Given a set of blogs related to a particular topic, the authors are trying to find a subset of blogs that represents the larger set by using a stochastic graph based method.
+
It has been also observed that algorithms like PageRank and HITS which works really well in WWW, does not outperform simpler algorithms used in this online community which confirms that structural differences may be the reason why complex algorithms  may not work well in other network structures.  
 
 
The authors approached to this [[AddressesProblem::blog retrieval]] problem with the assumption that important and representative blogs tend to be lexically similar to other important and representative blogs. Therefore they used textual similarity between posts as a way to understand which blog is affecting the others and so to determine the authorities.
 
 
 
The authors used a [[UsesMethod::PageRank]] like algorithm, called BlogRank, to rank the blogs by their popularity. In their algorithm they represented each blog with a node and put an edge between two nodes if they are lexically similar. Iterations over this graph calculates the importance score of a blog by using the scores of its neighbors.  
 
 
 
 
 
  
[[UsesMethod::Cosine similarity]] between tf-idf vector representations of posts are used the calculate the text similarity between posts. The authors also used blog related attributes such as number of posts, average length of posts etc. as priors. BlogRank algorithm takes diversity into account and penalize blogs that are quite similar to already selected blogs.
+
A simulation network model was created in which users make the best of their time by being more selective in choosing questions that are challenging to them yet they are still capable of answering. Analysis on this network showed that ExpertiseRank and Z score outperforms others especially HITS. This shows that performance of the expertise ranking algorithms depends highly on the dynamics of the communities.  
  
[[UsesDataset::TREC BLOG06]] and [[UsesDataset::UCLA Blogocenter]] datasets had been used in the experiments. They used [[UsesMethod::diffusion models]] to measure the performance of their algorithm. Initially they marked the selected nodes as active and then applied the diffusion model and counted the number of activated nodes at the end.
+
Expertise ranking algorithms may perform different in different structured networks therefore understanding the structural characteristics of network makes significant differences in the performance of these algorithms.
 
The authors tried several other algorithms to compare with their ranking algorithm. The experiments showed that BlogRank outperforms other methods both in coverage and in running time. They also performed experiments in order to see whether BlogRank algorithm can be used in predicting. The results indicated that BlogRank method generalizes well for the future.
 
  
This work is similar to the Blog Distillation task in the TREC Blog Track. However in blog distillation task, given a query the aim is to return all relevant blogs. In this paper, given set of blogs related to topic, the aim is to select smaller set of blogs. Some related works are [[RelatedPaper::Arguello et al, ICWSM 2008]] and [[RelatedPaper::Elsas et al, TREC 2007]].
+
A similar work is [[RelatedPaper::Littlepage et al]] and another one that works on emails is [[RelatedPaper::Dom et al, DMKD 2003]].

Latest revision as of 21:30, 1 April 2011

Citation

Jun Zhang, Mark S. Ackerman, and Lada Adamic. 2007. Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web (WWW '07). ACM, New York, NY, USA, 221-230.

Online version

ACM

Summary

The aim of this paper is to identify users with high expertise within online expertise-sharing communities. This expertise finding system uses graph-based algorithms on social networks within the community.

They created a post-reply network in which each user is represented as a node and a directed edge is created from each user who started the post to other users who replied to it. The prestige measure of this network is highly correlated with a user's expertise due to the way the network is constructed. Therefore this network is called community expertise network (CEN).

Network Characteristics

The authors experimented on the Java Forum which is a large online help-seeking community. Before testing the algorithms they did several analysis to characterize the network. Below are the performed analysis and their results

  • The Bow tie structure analysis : More than half of the users only asks questions. 13% only answers and 12% both answers and asks.
  • Degree distribution analysis : The majority of users answers only a few questions but few active users answers a lot of questions.
  • Degree correlation analysis : Top repliers answer questions for everyone but less expert users do not reply to high expert users.

It is important to note that these characteristics are different from WWW graphs.

Expertise Ranking Algorithms

  • Simple statistical measures : Just counting the number of replies or counting the number of users helped to calculate the score of expertise of a user.
  • Z-score : A measure that combines one's asking and replying patterns.
  • Expertise Rank Algorithm : PageRank like algorithm which uses not only count of users helped but also whom one helped.
  • HITS Authority : Similar to HITS algorithm where good hub is a user who is helped by many experts and good authority is a user who helps many good hubs.

In experiments the authors used Spearman's Rho and Kendall's Tau measures to understand the correlations between these ranking algorithms and the human-assigned ratings. It has been observed that they are highly correlated which means that structural information can be used to identify experts in online communities.

It has been also observed that algorithms like PageRank and HITS which works really well in WWW, does not outperform simpler algorithms used in this online community which confirms that structural differences may be the reason why complex algorithms may not work well in other network structures.

A simulation network model was created in which users make the best of their time by being more selective in choosing questions that are challenging to them yet they are still capable of answering. Analysis on this network showed that ExpertiseRank and Z score outperforms others especially HITS. This shows that performance of the expertise ranking algorithms depends highly on the dynamics of the communities.

Expertise ranking algorithms may perform different in different structured networks therefore understanding the structural characteristics of network makes significant differences in the performance of these algorithms.

A similar work is Littlepage et al and another one that works on emails is Dom et al, DMKD 2003.