# Power law fitting

This is a technical method discussed in Social Media Analysis 10-802 in Spring 2010.

### Estimating the Power-Law exponent from empirical data

There are many ways of estimating the value of the scaling exponent for a power-law tail, however not all of them yield unbiased and consistent answers. The most reliable techniques are often based on the method of maximum likelihood. Alternative methods are often based on making a linear regression on either the log-log probability, the log-log cumulative distribution function, or on log-binned data, but these approaches should be avoided as they can all lead to highly biased estimates of the scaling exponent.

For real-valued data, we fit a power-law distribution of the form

${\displaystyle p(x)={\frac {\alpha -1}{x_{\min }}}\left({\frac {x}{x_{\min }}}\right)^{-\alpha }}$

to the data ${\displaystyle x\geq x_{\min }}$. Given a choice for ${\displaystyle x_{\min }}$, a simple derivation by this method yields the estimator equation

${\displaystyle {\hat {\alpha }}=1+n\left[\sum _{i=1}^{n}\ln {\frac {x_{i}}{x_{\min }}}\right]^{-1}}$

where ${\displaystyle \{x_{i}\}}$ are the ${\displaystyle n}$ data points ${\displaystyle x_{i}\geq x_{\min }}$. This estimator exhibits a small finite sample-size bias of order ${\displaystyle O(n^{-1})}$, which is small when n > 100. Further, the uncertainty in the estimation can be derived from the maximum likelihood argument, and has the form ${\displaystyle \sigma ={\frac {\alpha -1}{\sqrt {n}}}}$. This estimator is equivalent to the popular Hill estimator from quantitative finance and extreme value theory.

For a set of n integer-valued data points ${\displaystyle \{x_{i}\}}$, again where each ${\displaystyle x_{i}\geq x_{\min }}$, the maximum likelihood exponent is the solution to the transcendental equation

${\displaystyle {\frac {\zeta '({\hat {\alpha }},x_{\min })}{\zeta ({\hat {\alpha }},x_{\min })}}=-{\frac {1}{n}}\sum _{i=1}^{n}\ln {\frac {x_{i}}{x_{\min }}}}$

where ${\displaystyle \zeta (\alpha ,x_{\mathrm {min} })}$ is the incomplete zeta function. The uncertainty in this estimate follows the same formula as for the continuous equation. However, the two equations for ${\displaystyle {\hat {\alpha }}}$ are not equivalent, and the continuous version should not be applied to discrete data, nor vice versa.

Further, both of these estimators require the choice of ${\displaystyle x_{\min }}$. For functions with a non-trivial ${\displaystyle L(x)}$ function, choosing ${\displaystyle x_{\min }}$ too small produces a significant bias in ${\displaystyle {\hat {\alpha }}}$, while choosing it too large increases the uncertainty in ${\displaystyle {\hat {\alpha }}}$, and reduces the statistical power of our model. In general, the best choice of ${\displaystyle x_{\min }}$ depends strongly on the particular form of the lower tail, represented by ${\displaystyle L(x)}$ above.

More about these methods, and the conditions under which they can be used, can be found in the Clauset et al. reference below. Further, this comprehensive review article provides usable code (Matlab and R) for estimation and testing routines for power-law distributions.