Power Laws and the Pareto Distribution

Power Laws, Pareto Distribution and Zipf's Law

Summarizing: [newman-2006-archiv:cond-mat]

Many measured quantities concentrate around a mean value.

Centered Distributions

images/newman2006fig1.png

Other Distributions

BUT not all measured concentrate nicely around a mean.

Some vary over an enormous range (sometimes many orders of magnitude). E.g., 2700 US city populations.

America’s total population of 300 million people could be allocated to 40 cities the size of New York. But America’s 2700 cities cannot have a mean population of more than 110,000.

City Populations

images/newman2006fig2a.png

Source: Newman (2006)

City Populations

A histogram of city sizes plotted with logarithmic horizontal and vertical axes follows quite closely a straight line.

City Populations

images/newman2006fig2b.png

Source: Newman (2006)

Power Law Distribution

lny=-alnx+c
f(x)=cx -a

Power law implies that small x are very common and large x are quite rare.

(See Mathematica notebook.)

Pareto Distribution

Note: these notes draw on http://en.wikipedia.org/wiki/Pareto_distribution, http://www.math.uah.edu/stat/special/Pareto.pdf,

Define Pareto Distribution

For xx m we have

Pr(X>x)= x m xα

CDF

For xx m

FX(x)=1- x m xα

Density function

Differentiating w.r.t. x, the probability density function is

f X(x)=α x mα x α+1

Moments (Mean and Variance)

If α1, the expected value is infinite. If α>1 then

E(X)= αx m α-1

If α2, the variance does not exist. If α>2 then

var(X)= x m α-12 α α-2.

The raw moment moments are

μn'= αx mn α-n

A finite n-th moment exists only for n<α.

Conditional Distribution

Here is a cool property of the Pareto variate. The conditional probability

Pr(X>x|X>x 1x m

is also a Pareto distribution. It has the same Pareto index α but has cutoff x 1.

Applications

(selected from Wikipedia)

Application to Wealth and Income

Wealth and income distribution
  • pdf shows that the "probability" or fraction of the population that owns a small amount of wealth per person is rather high, and then decreases steadily as wealth increases.
  • Pareto used this to describe the distribution of wealth and of income among individuals
    • most wealth of any society is owned by a small percentage of the people in that society
    • Pareto principle: "80-20 rule" (20% of the population controls 80% of the wealth)

Pareto principle

The 80-20 law holds when the Pareto index is α=log 45.

80-20 law
  • 20% of all people receive 80% of all income
  • make a group of the top 20% → 20% of the group receive 80% of its income
  • recursively

(Excludes 0<α1, which implies infinite expected value and thus cannot model income distribution.)

Relation to Zipf's law

Zipf's law

Relation to the exponential distribution

The Pareto distribution is related to the exponential distribution as follows. If X is Pareto-distributed with minimum x m and index α, then

Y=ln X x m

is exponentially distributed with intensity α. Equivalently, if Y is exponentially distributed with intensity α, then

x meY

is Pareto-distributed with minimum x m and index α.

Lorenz Curve (background)

Let f be any pdf with finite mean μ= x mxf(x)dx. Then the associated quantile function (Lorenz curve) L(F) can be written as

L(F)= x m x(F)xf(x)dx μ

Here x(F) is the inverse of the CDF. Recall F(x) is the fraction of the population with wealth no bigger than $x$, so x(F) gives the maximum wealth found in that subpopulation. So the numerator is this subpopulations average wealth. The denominator is the average wealth of the population.

Correspondingly, for any cdf (F) the Lorenz curve L(F) can be written as

L(F)= 0Fx(F')dF' 01x(F')dF'

Gini

Recall that the Gini coefficient measures the deviation of the Lorenz curve from perfect equality as twice the area between the Lorenz curve and the equidistribution line.

G=1-201L(F)dF

See Aaberge (2005) http://ideas.repec.org/p/ssb/dispap/491.html

Lorenz Curve and Gini Coefficient for the Pareto Distribution

For the Pareto distribution, assuming a finite mean (i.e., α1), the quantile function is

x(F)= x m (1-F) 1/α

So the Lorenz curve is calculated to be

L(F)=1-(1-F) 1-1/α

Lorenz Curves for Pareto Distributions

images/pareto-lorenz.png

Lorenz Curves Pareto Distributions

Compare Mathematica figure.

Gini Coefficient for Pareto Distributions

G= 1 2α-1

Note: α= → Gini = 0, and α=1 → Gini = 1.

Maximum Likelihood Estimation

The likelihood function for the Pareto index and cutoff parameters given a sample sample x=(x 1,x 2,...,x n), is

L(α,x m)= i=1nα x mα xi α+1=αnx m nα i=1n1 xi α+1

Therefore, the loglikelihood function is

(α,x m)=nlnα+nαlnx m-(α+1) i=1nlnxi

Estimating xm: (α,x m) is strictly increasing in x m. Since xx m, pick

x^ m=min ix i

Estimating the Pareto Index

To find the estimator for α, take the corresponding partial derivative and find α that makes it zero:

α= n α+nlnx m- i=1nlnxi=0.

Thus the maximum likelihood estimator for α is:

α^= n i lnxi-lnx^ m.

Sampling from Power Law (Pareto) Distributions

X= x m U 1/α

Sampling from Power Law (Pareto) Distributions


nsamples = 10**6
u = 1 - np.random.random(nsamples)  # U(0,1]
x = u**(-1.0/(2.5 - 1.0))  # pareto distributed

Sampling from Power Law (Pareto) Distributions


lastbin01 = 10
bins01 = np.linspace(1, lastbin01, 91) # bin size = 0.1
ind01 = np.digitize(x, bins01)
freq01 = np.bincount(ind01)[1:-1]    #trim the unbinned
relfreq01 = freq01 / float(nsamples)

Sampling from Power Law (Pareto) Distributions

images/pareto-smallvals.png

Pareto Distribution: Samller Values are Common

Sampling from Power Law (Pareto) Distributions

images/pareto-largevals.png

Pareto Distribution: Larger Values are Rare

References

Barry C. Arnold (1983). “Pareto Distributions”, International Co-operative Publishing House, Burtonsville, Maryland. ISBN 0-899974-012-1.

Christian Kleiber and Samuel Kotz (2003). Statistical Size Distributions in Economics and Actuarial Sciences, New York:Wiley. xi+332 pp. ISBN 0-471-15064-9.

Lorenz, M. O. (1905). "Methods of measuring the concentration of wealth". Publications of the American Statistical Association. 9: 209&ndash;219.

Pareto, Vilfredo, Cours d’Économie Politique: Nouvelle édition par G.-H. Bousquet et G. Busino, Librairie Droz, Geneva, 1964, pages 299&ndash;345.

Reed, William J. “The Pareto, Zipf and other power laws,” http://linkage.rockefeller.edu/wli/zipf/reed01_el.pdf

Aabergé, Rolf. “Gini's Nuclear Family” In: International Conference to Honor Two Eminent Social Scientists], May, 2005 http://www.unisi.it/eventi/GiniLorenz05/ http://www.unisi.it/eventi/GiniLorenz05/25%20may%20paper/PAPER_Aaberge.pdf

Michael Hardy (2010) "Pareto's Law", Mathematical Intelligencer, 32 (3), 38–43. doi: 10.1007/s00283-010-9159-2

M. E. J. Newman, 2005, “Power laws, Pareto distributions and Zipf's law” Contemporary Physics 46, pages 323–351. doi:10.1080/00107510500052444 http://arxiv.org/abs/cond-mat/0412004v3

[newman-2006-archiv:cond-mat]Newman, M E J. 2006. Power Laws, Pareto Distributions and Zipf’s Law. arXiv preprint cond-mat.stat-mech, 0412004v3.