Difference between revisions of "Leman Akoglu et al KDD'10"

From Cohen Courses
Jump to navigationJump to search
(Created page with '==Citation == Seshadri, Mukund and Machiraju, Sridhar and Sridharan, Ashwin and Bolot, Jean and Faloutsos, Christos and Leskove, Jure. Mobile call graphs: beyond power-law and lo…')
 
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
==Citation ==
 
==Citation ==
Seshadri, Mukund and Machiraju, Sridhar and Sridharan, Ashwin and Bolot, Jean and Faloutsos, Christos and Leskove, Jure.
+
Leman Akoglu, Bhavana Dalvi Structure, Tie Persistence and Event Detection in Large Phone and SMS Networks
Mobile call graphs: beyond power-law and lognormal distributions.
+
In KDD'10
In KDD'08
 
  
 
==Summary==
 
==Summary==
  
This is a [[paper]] about model fitting. The authors found that [[power law]], which is widely used in the society of social media analysis, sometimes might not fit the data well. The authors developed a method to fit the data to a Double Pareto LogNormal (DPLN) distribution instead. They demonstrated their method through a massive phone call dataset which has more than one million users.
+
This [[paper]] focuses on a large phone call data set from an Indian phone company. The paper tries to answer three questions: First, what is the structural property of the dataset? What are the relationships between different quantities taken from the egonet? Second, if a link exists between two people, will the link still exist in the future? Third, how to detect change-point anomalies?
  
 
==Brief Description==
 
==Brief Description==
The authors are interested in three metrics: the number of phone calls per customer, the total talk minutes per customer and the number of callers per customer. The first and last metrics are related to out-degrees, and the second metric is unique in a phone call dataset.
 
  
The DPLN is based on Geometric Brownian Motion, which is widely used in financial field in modeling the stock movement as time passes by. The formula for a Geometric Brownian Motion is:
+
The authors considered tie attributes and node attributes in the analysis. For tie attributes, they considered reciprocity and topology overlap. For node attributes, they considered node degrees, Cluster coefficients and user reciprocity defined as the proportion of reciprocity ties.
  
<math>
+
We first talk about general network property of the data. The authors found that the weights on mutual node pairs in Mobile Call Graph or Mobile Text graph are both small and even. They also found that nodes with high degree tend to connect to other high-degree nodes; node strength grows in a power law manner comparing to the node degree. Moreover, node strength increases as the number of overlapping neighbors between two nodes increases.
dS_t=\mu S_t dt+\sigma S_t dw_t
 
</math>
 
  
Here <math>w_t</math> represents a wiener process, which is a continuous markov process with independent increment. The randomness comes from this term. we can understand <math>\mu</math> as the drift term and <math>\sigma</math> as the variance term. One important result for Geometric Brownian Motion is that <math> \frac{S_t}{S_0}</math> follows a log-normal distribution.  
+
Moreover, the authors attacked the tie persistence problem using logistic regression. They compared their method with the traditional LR approach as well.
  
By introducing the randomness the authors obtain a very good fit for the data.
+
The authors also looked at the change-point behavior. Specifically, they are interested in detecting the time stamp where the behavior of each node changes. They first extracted features including degree, weight, reciprocity edges, triangles in egonet etc. They next considered an eigen decomposition approach using tensor. The third dimension is the time. After this step each node can be represented as a matrix changing over time. By taking eigenvectors over time, they are able to identify different behaviors using dot product between these vectors.
 
 
The fit can be used for anomaly detection and pricing structural design. The authors are also interested in the evolution of data over time (They refer it by a generative process). They sliced the time into discrete pieces, and looked at the ratio of each metrics over time. Based on the result of the fit, a lognormal multiplicative process fits the data very well.
 

Latest revision as of 22:17, 15 February 2011

Citation

Leman Akoglu, Bhavana Dalvi Structure, Tie Persistence and Event Detection in Large Phone and SMS Networks In KDD'10

Summary

This paper focuses on a large phone call data set from an Indian phone company. The paper tries to answer three questions: First, what is the structural property of the dataset? What are the relationships between different quantities taken from the egonet? Second, if a link exists between two people, will the link still exist in the future? Third, how to detect change-point anomalies?

Brief Description

The authors considered tie attributes and node attributes in the analysis. For tie attributes, they considered reciprocity and topology overlap. For node attributes, they considered node degrees, Cluster coefficients and user reciprocity defined as the proportion of reciprocity ties.

We first talk about general network property of the data. The authors found that the weights on mutual node pairs in Mobile Call Graph or Mobile Text graph are both small and even. They also found that nodes with high degree tend to connect to other high-degree nodes; node strength grows in a power law manner comparing to the node degree. Moreover, node strength increases as the number of overlapping neighbors between two nodes increases.

Moreover, the authors attacked the tie persistence problem using logistic regression. They compared their method with the traditional LR approach as well.

The authors also looked at the change-point behavior. Specifically, they are interested in detecting the time stamp where the behavior of each node changes. They first extracted features including degree, weight, reciprocity edges, triangles in egonet etc. They next considered an eigen decomposition approach using tensor. The third dimension is the time. After this step each node can be represented as a matrix changing over time. By taking eigenvectors over time, they are able to identify different behaviors using dot product between these vectors.