Power law fitting

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a technical method discussed in Social Media Analysis 10-802 in Spring 2010.

Estimating the Power-Law exponent from empirical data

There are many ways of estimating the value of the scaling exponent for a power-law tail, however not all of them yield unbiased and consistent answers. The most reliable techniques are often based on the method of maximum likelihood. Alternative methods are often based on making a linear regression on either the log-log probability, the log-log cumulative distribution function, or on log-binned data, but these approaches should be avoided as they can all lead to highly biased estimates of the scaling exponent.

For real-valued data, we fit a power-law distribution of the form

to the data . Given a choice for , a simple derivation by this method yields the estimator equation

where are the data points . This estimator exhibits a small finite sample-size bias of order , which is small when n > 100. Further, the uncertainty in the estimation can be derived from the maximum likelihood argument, and has the form . This estimator is equivalent to the popular Hill estimator from quantitative finance and extreme value theory.

For a set of n integer-valued data points , again where each , the maximum likelihood exponent is the solution to the transcendental equation

where is the incomplete zeta function. The uncertainty in this estimate follows the same formula as for the continuous equation. However, the two equations for are not equivalent, and the continuous version should not be applied to discrete data, nor vice versa.

Further, both of these estimators require the choice of . For functions with a non-trivial function, choosing too small produces a significant bias in , while choosing it too large increases the uncertainty in , and reduces the statistical power of our model. In general, the best choice of depends strongly on the particular form of the lower tail, represented by above.

More about these methods, and the conditions under which they can be used, can be found in the Clauset et al. reference below. Further, this comprehensive review article provides usable code (Matlab and R) for estimation and testing routines for power-law distributions.