Difference between revisions of "User:Diliu"

From Cohen Courses
Jump to navigationJump to search
 
(One intermediate revision by the same user not shown)
Line 4: Line 4:
  
 
With regard to the project, we are interested in analyzing a large phone call dataset. The data contains the phone records over half a year, together with SMS messages. We would like to focus on anomaly detection.
 
With regard to the project, we are interested in analyzing a large phone call dataset. The data contains the phone records over half a year, together with SMS messages. We would like to focus on anomaly detection.
 +
=Homework2=
  
 
[[Seshadri et at KDD'08]]
 
[[Seshadri et at KDD'08]]
  
In this paper, the authors investigated node/edge properties from a massive phone call dataset. They are interested in three metrics: the number of phone calls per customer, the total talk minutes per customer and the number of callers per customer.
+
[[Leman Akoglu et al KDD'10]]
 
 
The major contribution from this paper is that power law, which is widely used in the society of social media analysis, does not fit the data well. The authors, instead, developed a method to fit the data to a Double Pareto LogNormal (DPLN) distribution.
 
 
 
The DPLN is based on Geometric Brownian Motion, which is widely used in financial field in modeling the stock movement. It is based on a wiener process, and the exponential term guarantees that the entries will never be less than zero -- the name log normal comes from both wiener process (we can view it as a normal distribution) and the exponential term.
 
 
 
The fit can be used for anomaly detection and pricing structural design. The authors are also interested in the evolution of data over time (They refer it by a generative process). They sliced the time into discrete pieces, and looked at the ratio of each metrics over time. Based on the result of the fit, a lognormal multiplicative process fits the data very well.
 

Latest revision as of 22:35, 15 February 2011

This is Di Liu's wiki page. I am a Ph.d. student from Department of Statistics in Carnegie Mellon.

In the 10802 class, I am working with Daniel Percival on a mobile phone call dataset. Daniel Percival is also a Ph.d. student in Department of Statistics.

With regard to the project, we are interested in analyzing a large phone call dataset. The data contains the phone records over half a year, together with SMS messages. We would like to focus on anomaly detection.

Homework2

Seshadri et at KDD'08

Leman Akoglu et al KDD'10