Difference between revisions of "The origin of bursts and heavy tails in human dynamics"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == Albert-László Barabási. The origin of bursts and heavy tails in human dynamics. Nature 435, 207-211 (12 May 2005) == Online Version == [http://nd.edu/~networ…')
 
Line 9: Line 9:
 
== Summary ==
 
== Summary ==
  
This [[Category::paper]] applies [[UsesMethod::Random walk with restart]] as a method of [[AddressesProblem::Content recommendation|content recommendation]] in social network systems.  
+
This [[Category::paper]] argues that in contrast to a Poisson distribution, the waiting time of tasks will follow Pareto distribution if it is the case that people execute tasks based on some perceived priority.  
  
As it is in the context of social network, naturally and effectively, a graph based algorithm such as [[UsesMethod::Random walk with restart]] can be considered to perform the [[AddressesProblem::Content recommendation|recommendation]] task.
+
This paper discussed three different models and their corresponding task waiting time distribution:
  
This [[Category::paper]] argues that the extra knowledge provided by the users' social activity, such as the social annotation and friendships inherent in the social graph established among users, items and tags, can improve the performance of a recommendation system using methods such as [[UsesMethod::Random walk with restart]].
+
1. The first-in-first-out model. This model assumes that the tasks are first come first served. The tasks waiting time is based on all the tasks ahead of it, thus most of the tasks have approximately the same waiting time and have an exponential tail.
 +
2. The random order model. The tasks are pickup randomly to be executed. Such model also make the task waiting time distribution exponential.
 +
3. The priority model. People will execute the task with the highest priority in the task bucket first. By maths deduction and computer simulation, the author has the conclusion that such model will lead to a heavy-tail process: most of the tasks will be rapidly executed and a few will have long waiting times.
  
The argument is supported based on a series of comparison experiments between the [[UsesMethod::Random walk with restart]] model and a user-based [[Social_collaborative_filtering|collaborative filtering]] method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard [[Social_collaborative_filtering|collaborative filtering]] method.
+
The author believes that, according to the third model, the bursty nature of human dynamics is the consequence of the decision-make process by the human-being.
 +
== The dataset and experiment ==
 +
The author studied based on several thousands of [[UsesDataste::thousand.emails|emails]] with the information of senders, receivers, time and the size of each email.
 +
The following graph shows the distribution of time intervals between consecutive e-mails sent by a single user over three-month time interval. We can clearly se e the heavy-tailed activity pattern in the graph. The x-axis extends exponentially instead of linearly. The slope rate of the solid line is roughly -1.
 +
[[File:emails.sent.interval.time.png]]
 +
Similarly, the time taken by the user to reply a received message also follows the same pattern:
 +
[[File:emails.reply.interval.time.png]]
  
== The dataset ==
 
 
The source of [[UsesDataset::last.fm|dataset]] comes from last.fm - a Web 2.0 website. As introduced in the [[UsesDataset::last.fm|dataset]], there are three components serving as the vertices in the graph - users, music tracks and tags, deriving relation matrices as User-Track(UTr), User-Tag(UTg) and Track-Tag(TrTg).
 
 
The UTr sub-matrix consists of the playcount of every track by each user, i.e. the number of times it has been listened to.
 
 
The UU sub-matrix consists the average user playcount. This is crucial because the UTr sub-matrix may contain very large values and thus the binary values in UU will be suppressed.
 
 
In the case of the UTg sub-matrix, the associated tags for each user have been collected based on popularity. An exponential decay function is applied to the values of the tags of each user in the matrix, where the most popular tag gets the average user's playcount. The same process was performed for the TrTg sub-matrix accordingly.
 
 
The full social graph look like:
 
 
[[File:Last.fm.graph.png]]
 
 
== Experiment results and observations==
 
The experiment result is concluded as following table:
 
[[File:Last.fm.result.png]]
 
 
There are several observations from the result worth noticing:
 
 
*The addition of social network and social tagging information increases the number of relevant retrieved (num_rel_ret) tracks in the [[Random walk with restart|RWR]] method.
 
*There is also a notable decrease in MAP and Precision at high ranks and in parallel to num_rel_ret increase and Precision at lower ranks. This tendency accounts for the fact that with the progressive addition of social knowledge the method retrieves more tracks thus effectively increases its recall but at lower ranks (e.g P@200), causing a harm to its precision.
 
*the [[Random walk with restart|RWR]] method with the UTrUTg graph retrieves statistically significantly more relative tracks than the RWR method with the UTrUU graph (and also has higher P@200 and P@1000) compared to the simple [[Random walk with restart|RWR]] method.
 
The above three observation highlights the benifits of the social graph.
 
 
*Using the same sub-matrix as the input, the [[Random walk with restart|RWR]] method always outperforms the [[Social_collaborative_filtering | CF]] method. This supports the argument that the [[Random walk with restart|RWR]] method is more effective than the [[Social_collaborative_filtering | CF]] method.
 
 
*Using the [[Social_collaborative_filtering | CF]] method, the addition of friendships and social tagging information deteriorates the performance increasingly. This supports the argument that the memory based [[Social_collaborative_filtering | CF]] method cannot provide with adequate non-trivial mechanisms to incorporate social knowledge.
 
 
== Related papers ==
 
*[[RelatedPaper:: Bu et al. MM 2010|Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content]]
 
Interesting article also leverages the social media information and the music content itself to recommend music.
 
*[[RelatedPaper:: Yildirim and Krishnamoorthy MM RecSys 2008|A random walk method for alleviating the sparsity problem in collaborative filtering]]
 
Another recommendation algorithm using random walk.
 
*[[RelatedPaper: Yuan et al. RecSys 2009|Augmenting collaborative recommender by fusing explicit social relationships]]
 
Two principal methods to integrate explicit social relationships into traditional CF methods: the weighted-similarity fusion and the graph fusion.
 
  
 
== Study plan ==
 
== Study plan ==
* Web page: [[Random walk with restart]]
+
Basically it is a straight forward article just requiring some basic statistics knowledge.
* Slides: [http://www.cs.cmu.edu/~wcohen/collab-filtering-tutorial.ppt Collaborative Filtering: A Tutorial]
+
* Poisson Distribution: [wiki link here]
* Article: [http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient Pearson correlation]
+
* Pareto Distribution: [wiki link here]

Revision as of 10:24, 6 November 2012

Citation

Albert-László Barabási. The origin of bursts and heavy tails in human dynamics. Nature 435, 207-211 (12 May 2005)

Online Version

online version

Summary

This paper argues that in contrast to a Poisson distribution, the waiting time of tasks will follow Pareto distribution if it is the case that people execute tasks based on some perceived priority.

This paper discussed three different models and their corresponding task waiting time distribution:

1. The first-in-first-out model. This model assumes that the tasks are first come first served. The tasks waiting time is based on all the tasks ahead of it, thus most of the tasks have approximately the same waiting time and have an exponential tail. 2. The random order model. The tasks are pickup randomly to be executed. Such model also make the task waiting time distribution exponential. 3. The priority model. People will execute the task with the highest priority in the task bucket first. By maths deduction and computer simulation, the author has the conclusion that such model will lead to a heavy-tail process: most of the tasks will be rapidly executed and a few will have long waiting times.

The author believes that, according to the third model, the bursty nature of human dynamics is the consequence of the decision-make process by the human-being.

The dataset and experiment

The author studied based on several thousands of emails with the information of senders, receivers, time and the size of each email. The following graph shows the distribution of time intervals between consecutive e-mails sent by a single user over three-month time interval. We can clearly se e the heavy-tailed activity pattern in the graph. The x-axis extends exponentially instead of linearly. The slope rate of the solid line is roughly -1. Emails.sent.interval.time.png Similarly, the time taken by the user to reply a received message also follows the same pattern: Emails.reply.interval.time.png


Study plan

Basically it is a straight forward article just requiring some basic statistics knowledge.

  • Poisson Distribution: [wiki link here]
  • Pareto Distribution: [wiki link here]