User:Diliu

From Cohen Courses
Revision as of 18:47, 4 February 2011 by Diliu (talk | contribs)
Jump to navigationJump to search

This is Di Liu's wiki page. I am a Ph.d. student from Department of Statistics in Carnegie Mellon.

In the 10802 class, I am working with Daniel Percival on a mobile phone call dataset. Daniel Percival is also a Ph.d. student in Department of Statistics.

With regard to the project, we are interested in analyzing a large phone call dataset. The data contains the phone records over half a year, together with SMS messages. We would like to focus on anomaly detection.

Seshadri et at KDD'08

In this paper, the authors investigated node/edge properties from a massive phone call dataset. They are interested in: the number of phone calls per customer, the total talk minutes per customer and the number of callers per customer.

The major contribution from this paper is that power law, which is widely used in the society of social media analysis, does not fit the data well. The authors, instead, developed a method to fit the data to a Double Pareto LogNormal (DPLN) distribution.

The DPLN is based on Geometric Brownian Motion, which is widely used in financial field in modeling the stock movement. It is based on a wiener process, and the exponential term guarantees that the entries will never be less than zero -- the name log normal comes from both wiener process (we can view it as a normal distribution) and the exponential term.

The fit can be used for anomaly detection and pricing structural design.