Difference between revisions of "Lin et al KDD 2010"

From Cohen Courses
Jump to navigationJump to search
 
(16 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
== Citation ==
 
== Citation ==
  
[http://www.cs.uiuc.edu/~hanj/pdf/kdd10_xlin.pdf PET:A Statistical Model for Popular Events Tracking in Social Communities]  
+
[http://www.cs.uiuc.edu/~hanj/pdf/kdd10_xlin.pdf PET:A Statistical Model for Popular Events Tracking in Social Communities] Cindy Xide Lin, Bo Zhao, [http://www-personal.umich.edu/~qmei/ Qiaozhu Mei],  [http://www.cs.uiuc.edu/~hanj/ Jiawei Han] KDD, 2010
 
 
== Authors ==
 
 
 
Cindy Xide Lin, Bo Zhao, [http://www-personal.umich.edu/~qmei/ Qiaozhu Mei],  [http://www.cs.uiuc.edu/~hanj/ Jiawei Han]
 
  
 
== Problem ==
 
== Problem ==
  
In this paper, the authors address a method to [[AddressesProblem::observe and track the popular events or topics]] that evolve over time in the communities. Existing methods separate topics and netword structures apart. In this paper, [[UsesMethod::textual topics and network are combined]] together which makes more sense.
+
In this paper, the authors try to [[AddressesProblem::observe and track the popular events or topics]] that evolve over time in the communities.
 
 
== Method ==
 
The authors address the event tracking by first defining a term - [[UsesMethod::Popular Event Tracking]] (PET) in online communities which includes the popularity of events over time, the burstiness of user interest, information diffusion through the network structure and the evolution of topics.
 
 
 
== Details about the Model ==
 
 
 
First, the authors introduce two reasonable independent assumptions:
 
 
 
1. Given the current network structure and the previous interest status, the current interest status is independent of the document collection. This is a cause-effect assumption that people generate the document at this moment as a result of current interest rather than a cause of current interest.
 
 
 
2. Given the current interest status and the document, the current topic model is independent of the network structure and previous interest status. The reason is that once the user has a certain level of interest toward some event, the contents he produces will only depends on the event and the interest level.
 
 
 
Based on the above assumptions, the inference target becomes: <math>P(H_k,\theta_k|G_k,D_k,H_{k-1})= P(H_k|G_k,H_{k-1})P(\theta_k|H_k,D_k)</math>
 
The first part <math>P(H_k|G_k,H_{k-1})</math> is corresponding to the first assumption and is called the '''interest model''' because it measures the distribution of interest. The second part <math>P(\theta_k|H_k,D_k)</math> is related to the second assumption and is called '''topic model''' which deals with topic distribution.
 
 
 
The authors use [[UsesMethod::Gibbs Random Field]] as the interest model. In Gibbs Random Field, the authors define the energy function <math>U(H_k)=\sum^N_{i=1}V_i(h_k(i))+\sum^N_{i=1}V'_i(h_k(i),h_k(-i))</math>, where -i refers to all vertices except i.
 
 
 
<math>V_i(h_k(i))=(h_k(i)-h_{k-1}(i))^2</math>. This is regarded as transition energy. The authors would like to minimize this cost so that the interest value will become generally consistent. This definition is corresponding to observation 2 mentioned above.
 
 
 
<math>V'_i(h_k(i),h_k(-i))=\lambda_{k,i}(h_k(i)-h'_k(i))^2</math>, where <math>h'_k(i)</math> is the expectation of <math>h_k(i)</math> estimated from user i's neighbors. This is exact observation 1.
 
 
 
Then based on observation 3, the authors define the topic model <math>p(\theta^E_k|d_{k.i})=h_k(i)</math> which means that a higher interest of user i toward certain event will lead a higher proportion of his text belonging to that event.
 
  
 +
== Summary ==
  
 +
Existing methods separate topics and network structure apart. In this paper, textual topics and network are combined together which makes more sense. The authors address the event tracking by using a model - [[UsesMethod::Popular Event Tracking]] (PET) in online communities which includes the popularity of events over time, the burstiness of user interest, information diffusion through the network structure and the evolution of topics.
  
 
== Dataset ==
 
== Dataset ==
  
In this paper, the authors select [[UsesDataset::twitter]] as their source of data. They choose 5,000 users with follower-followee relationships and crawling down 1,438,826 tweets displayed by these users during the period from Oct.2009 to early Jan.2010. Each day is regarded as a time point. Document is obtained by simply concatenating all tweets displayed by the user in certain day. The connection is defined as the number of tweets displayed by user by following another user during the period of 30 days.
+
In this paper, the authors select [[UsesDataset::twitter]] as their source of data.
  
 +
== Evaluation ==
  
 +
Here in this paper, the authors compare PET with some other baseline models such as JonK, Cont, BOM and GInt. The authors apply these models to analyze both the Popularity Trend and Network diffusion. The [[result]] shows that PET generates the most consistent trends and the smoothest diffusion.
  
== Comparison with baseline models ==
+
== Related Papers ==
  
Here in this paper, the authors compare PET with some other baseline models such as JonK, Cont, BOM and GInt. The authors apply these models to analyze both the Popularity Trend and Network diffusion. The conclusion is that PET generates the most consistent trends and the smoothest diffusion because PET estimates the popularity by comprehensively considering historic, textual and structured information in a unified way.  
+
[1] L.A.Adamic and E.Adar. [[RelatedPaper::Friends and neighbors on the web]].
  
The popularity trend is shown in Fig.1. Network diffusion is shown in Fig.2.
+
[2] L.Araujo, J.A.Cuesta. [[RelatedPaper::Genetic algorithm for burst detection and activity tracking in event streams]].
[[File:Result.jpg]]
 
[[File:Result2.jpg]]
 

Latest revision as of 23:27, 6 February 2011

This is a Paper I read in 10802 social media analysis.

Citation

PET:A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han KDD, 2010

Problem

In this paper, the authors try to observe and track the popular events or topics that evolve over time in the communities.

Summary

Existing methods separate topics and network structure apart. In this paper, textual topics and network are combined together which makes more sense. The authors address the event tracking by using a model - Popular Event Tracking (PET) in online communities which includes the popularity of events over time, the burstiness of user interest, information diffusion through the network structure and the evolution of topics.

Dataset

In this paper, the authors select twitter as their source of data.

Evaluation

Here in this paper, the authors compare PET with some other baseline models such as JonK, Cont, BOM and GInt. The authors apply these models to analyze both the Popularity Trend and Network diffusion. The result shows that PET generates the most consistent trends and the smoothest diffusion.

Related Papers

[1] L.A.Adamic and E.Adar. Friends and neighbors on the web.

[2] L.Araujo, J.A.Cuesta. Genetic algorithm for burst detection and activity tracking in event streams.