Difference between revisions of "Seshadri et at KDD'08"
(4 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
The authors are interested in three metrics: the number of phone calls per customer, the total talk minutes per customer and the number of callers per customer. The first and last metrics are related to out-degrees, and the second metric is unique in a phone call dataset. | The authors are interested in three metrics: the number of phone calls per customer, the total talk minutes per customer and the number of callers per customer. The first and last metrics are related to out-degrees, and the second metric is unique in a phone call dataset. | ||
− | The DPLN is based on Geometric Brownian Motion, which is widely used in financial field in modeling the stock movement. The formula for a Geometric Brownian Motion is: | + | The DPLN is based on Geometric Brownian Motion, which is widely used in financial field in modeling the stock movement as time passes by. The formula for a Geometric Brownian Motion is: |
<math> | <math> | ||
− | dS_t=\ | + | dS_t=\mu S_t dt+\sigma S_t dw_t |
− | <math> | + | </math> |
+ | |||
+ | Here <math>w_t</math> represents a wiener process, which is a continuous markov process with independent increment. The randomness comes from this term. we can understand <math>\mu</math> as the drift term and <math>\sigma</math> as the variance term. One important result for Geometric Brownian Motion is that <math> \frac{S_t}{S_0}</math> follows a log-normal distribution. | ||
− | + | By introducing the randomness the authors obtain a very good fit for the data. | |
The fit can be used for anomaly detection and pricing structural design. The authors are also interested in the evolution of data over time (They refer it by a generative process). They sliced the time into discrete pieces, and looked at the ratio of each metrics over time. Based on the result of the fit, a lognormal multiplicative process fits the data very well. | The fit can be used for anomaly detection and pricing structural design. The authors are also interested in the evolution of data over time (They refer it by a generative process). They sliced the time into discrete pieces, and looked at the ratio of each metrics over time. Based on the result of the fit, a lognormal multiplicative process fits the data very well. |
Latest revision as of 19:17, 4 February 2011
Citation
Seshadri, Mukund and Machiraju, Sridhar and Sridharan, Ashwin and Bolot, Jean and Faloutsos, Christos and Leskove, Jure. Mobile call graphs: beyond power-law and lognormal distributions. In KDD'08
Summary
This is a paper about model fitting. The authors found that power law, which is widely used in the society of social media analysis, sometimes might not fit the data well. The authors developed a method to fit the data to a Double Pareto LogNormal (DPLN) distribution instead. They demonstrated their method through a massive phone call dataset which has more than one million users.
Brief Description
The authors are interested in three metrics: the number of phone calls per customer, the total talk minutes per customer and the number of callers per customer. The first and last metrics are related to out-degrees, and the second metric is unique in a phone call dataset.
The DPLN is based on Geometric Brownian Motion, which is widely used in financial field in modeling the stock movement as time passes by. The formula for a Geometric Brownian Motion is:
Here represents a wiener process, which is a continuous markov process with independent increment. The randomness comes from this term. we can understand as the drift term and as the variance term. One important result for Geometric Brownian Motion is that follows a log-normal distribution.
By introducing the randomness the authors obtain a very good fit for the data.
The fit can be used for anomaly detection and pricing structural design. The authors are also interested in the evolution of data over time (They refer it by a generative process). They sliced the time into discrete pieces, and looked at the ratio of each metrics over time. Based on the result of the fit, a lognormal multiplicative process fits the data very well.