Seshadri et at KDD'08

From Cohen Courses
Revision as of 19:58, 4 February 2011 by Diliu (talk | contribs) (Created page with 'In this paper, the authors investigated node/edge properties from a massive phone call dataset. They are interested in three metrics: the number of phone calls per customer, the …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

In this paper, the authors investigated node/edge properties from a massive phone call dataset. They are interested in three metrics: the number of phone calls per customer, the total talk minutes per customer and the number of callers per customer.

The major contribution from this paper is that power law, which is widely used in the society of social media analysis, does not fit the data well. The authors, instead, developed a method to fit the data to a Double Pareto LogNormal (DPLN) distribution.

The DPLN is based on Geometric Brownian Motion, which is widely used in financial field in modeling the stock movement. It is based on a wiener process, and the exponential term guarantees that the entries will never be less than zero -- the name log normal comes from both wiener process (we can view it as a normal distribution) and the exponential term.

The fit can be used for anomaly detection and pricing structural design. The authors are also interested in the evolution of data over time (They refer it by a generative process). They sliced the time into discrete pieces, and looked at the ratio of each metrics over time. Based on the result of the fit, a lognormal multiplicative process fits the data very well.