The
Gini coefficient is a
measure
of statistical dispersion developed by the
Italian statistician Corrado
Gini and published in his 1912 paper "Variability and
Mutability" ( ). It is commonly used as a measure of inequality of
income or
wealth. It has, however, also found
application in the study of inequalities in disciplines as diverse
as
health science,
ecology, and
chemistry.
Definition
Graphical representation of the Gini
coefficient.
The graph shows that while the Gini is technically equal to the
area marked 'A' divided by the sum of the areas marked 'A' and 'B'
(that is, Gini = A/(A+B)), it is also equal to 2*A, since A+B = 0.5
since the axes scale from 0 to 1, and the total surface of the
graph therefore equals 1.
The Gini coefficient is usually defined
mathematically based on the
Lorenz curve (below). It can be thought of as
the
ratio of the
area that
lies between the line of equality and the Lorenz curve (marked 'A'
in the diagram) over the total area under the line of equality
(marked 'A' and 'B' in the diagram); i.e., G=A/(A+B).
The Gini coefficient can range from 0 to 1; it is sometimes
multiplied by 100 to range between 0 and 100. A low Gini
coefficient indicates a more equal distribution, with 0
corresponding to perfect equality, while higher Gini coefficients
indicate more unequal distribution, with 1 corresponding to perfect
inequality. To be validly computed, no negative goods can be
distributed. Thus, if the Gini coefficient is being used to
describe
household income
inequality, then no
household can have a
negative income. When used as a measure of income inequality, the
most unequal society will be one in which a single person receives
100% of the total income and the remaining people receive none
(G=1); and the most equal society will be one in which every person
receives the same percentage of the total income (G=0).
Some find it more intuitive (and it is mathematically equivalent)
to think of the Gini coefficient as half of the
Relative mean difference. The mean
difference is the average absolute difference between two items
selected randomly from a population, and the relative mean
difference is the mean difference divided by the average, to
normalize for scale.
Worldwide, Gini coefficients for income range
from approximately 0.230 in Sweden to 0.707 in
Namibia although not every country has been
assessed.
As a mathematical measure of inequality, the Gini coefficient
carries no
moral judgement about whether a
particular level of equality or inequality is good or bad.
Different uses
Although the Gini coefficient is most popular in economics, it can
in theory be applied in any field of science that studies a
distribution. For example, in ecology the Gini coefficient has been
used as a measure of
biodiversity,
where the cumulative proportion of species is plotted against
cumulative proportion of individuals. In health, it has been used
as a measure of the inequality of health related
quality of life in a population. In
chemistry it has been used to express the selectivity of
protein kinase inhibitors against
a panel of kinases.
Calculation
The Gini index is defined as a ratio of the areas on the
Lorenz curve diagram. If the area between the
line of perfect equality and the Lorenz curve is A, and the area
under the Lorenz curve is B, then the Gini index is A/(A+B). Since
A+B = 0.5, the Gini index, G = A/(0.5) = 2A = 12B. If the Lorenz
curve is represented by the function Y = L(X), the value of B can
be found with
integration and:
 G = 1  2\,\int_0^1 L(X) dX.
In some cases, this equation can be applied to calculate the Gini
coefficient without direct reference to the Lorenz curve. For
example:
 For a population uniform on the values
y_{i}, i = 1 to n,
indexed in nondecreasing order ( y_{i} ≤
y_{i+1}):
 G = \frac{1}{n}\left ( n+1  2 \left ( \frac{\Sigma_{i=1}^n \;
(n+1i)y_i}{\Sigma_{i=1}^n y_i} \right ) \right )
 This may be simplified to:
 G = \frac{2 \Sigma_{i=1}^n \; i y_i}{n \Sigma_{i=1}^n y_i}
\frac{n+1}{n}
 For a discrete
probability function f(y), where
y_{i}, i = 1 to n, are
the points with nonzero probabilities and which are indexed in
increasing order ( y_{i} <
y_{i+1}):
 G = 1  \frac{\Sigma_{i=1}^n \; f(y_i)(S_{i1}+S_i)}{S_n}
 where
 S_i = \Sigma_{j=1}^i \; f(y_j)\,y_j\, and S_0 = 0\,
 G = 1  \frac{1}{\mu}\int_0^\infty (1F(y))^2dy =
\frac{1}{\mu}\int_0^\infty F(y)(1F(y))dy
 Since the Gini coefficient is half the relative mean
difference, it can also be calculated using formulas for the
relative mean difference. For a random sample S consisting
of values y_{i}, i = 1 to
n, that are indexed in nondecreasing order (
y_{i} ≤
y_{i+1}), the statistic:
 G(S) = \frac{1}{n1}\left (n+1  2 \left ( \frac{\Sigma_{i=1}^n
\; (n+1i)y_i}{\Sigma_{i=1}^n y_i}\right ) \right )
 is a consistent estimator of the population Gini coefficient, but
is not, in general, unbiased. Like, G, G(S) has a
simpler form:
 G(S) = 1  \frac{2}{n1}\left ( n  \frac{\Sigma_{i=1}^n \;
iy_i}{\Sigma_{i=1}^n y_i}\right ) .
There does not exist a sample statistic that is in general an
unbiased estimator of the population Gini coefficient, like the
relative mean
difference.
Sometimes the entire Lorenz curve is not known, and only values at
certain intervals are given. In that case, the Gini coefficient can
be approximated by using various techniques for
interpolating the missing values of the Lorenz
curve. If ( X
_{k} , Y
_{k} ) are the known points
on the Lorenz curve, with the X
_{k} indexed in increasing
order ( X
_{k  1} < X
_{ k} ), so
that:
 X_{k} is the cumulated proportion of the population
variable, for k = 0,...,n, with X_{0} = 0, X_{n} =
1.
 Y_{k} is the cumulated proportion of the income
variable, for k = 0,...,n, with Y_{0} = 0, Y_{n} =
1.
If the Lorenz curve is approximated on each interval as a line
between consecutive points, then the area B can be approximated
with
trapezoids and:
 G_1 = 1  \sum_{k=1}^{n} (X_{k}  X_{k1}) (Y_{k} +
Y_{k1})
is the resulting approximation for G. More accurate results can be
obtained using other methods to
approximate the area B, such as
approximating the Lorenz curve with a
quadratic function across pairs of intervals,
or building an appropriately smooth approximation to the underlying
distribution function that matches the known data. If the
population mean and boundary values for each interval are also
known, these can also often be used to improve the accuracy of the
approximation.
The Gini coefficient calculated from a sample is a statistic and
its standard error, or confidence intervals for the population Gini
coefficient, should be reported. These can be calculated using
bootstrap techniques but those proposed have been mathematically
complicated and computationally onerous even in an era of fast
computers. Ogwang (2000) made the process more efficient by setting
up a “trick regression model” in which the incomes in the sample
are ranked with the lowest income being allocated rank 1. The model
then expresses the rank (dependent variable) as the sum of a
constant
A and a
normal
error term whose variance is inversely proportional to
y_{k};
 k = A + \ N(0, s^{2}/y_k)
Ogwang showed that
G can be expressed as a function of the
weighted least squares estimate of the constant
A and that
this can be used to speed up the calculation of the jackknife
estimate for the standard error. Giles (2004) argued that the
standard error of the estimate of
A can be used to derive
that of the estimate of
G directly without using a
jackknife at all. This method only requires the use of ordinary
least squares regression after ordering the sample data. The
results compare favorably with the estimates from the jackknife
with agreement improving with increasing sample size. The paper
describing this method can be found here:
http://web.uvic.ca/econ/ewp0202.pdf
However it has since been argued that this is dependent on the
model’s assumptions about the error distributions (Ogwang 2004) and
the independence of error terms (Reza & Gastwirth 2006) and
that these assumptions are often not valid for real data sets. It
may therefore be better to stick with jackknife methods such as
those proposed by Yitzhaki (1991) and Karagiannis and Kovacevic
(2000). The debate continues.
The Gini coefficient can be calculated if you know the mean of a
distribution, the number of people (or percentiles), and the income
of each person (or percentile).
Princeton development
economist Angus Deaton (1997, 139)
simplified the Gini calculation to one easy formula:
 G = \frac{N+1}{N1}\frac{2}{N(N1)u}(\Sigma_{i=1}^n \;
P_iX_i)
where u is mean income of the population, P
_{i} is the
income rank P of person i, with income X, such that the richest
person receives a rank of 1 and the poorest a rank of N. This
effectively gives higher weight to poorer people in the income
distribution, which allows the Gini to meet the Transfer
Principle.
Income Gini indices in the world
A complete listing is in
list of countries by income
equality; the article
economic
inequality discusses the social and policy aspects of income
and asset inequality.
[[Image:Gini Coefficient World CIA Report
2009.pngthumbright500pxGini coefficient, income distribution by
country. { width=100% Gini coefficient, income distribution by
country.]]
While most
developed European nations and Canada tend to have
Gini indices between 24 and 36, the United States' and Mexico's
Gini indices are both above 40, indicating that the United States and Mexico have
greater inequality. Using the Gini can help quantify
differences in
welfare and
compensation policies and philosophies. However
it should be borne in mind that the Gini coefficient can be
misleading when used to make political comparisons between large
and small countries (see
criticisms section).
The Gini index for the entire world has been estimated by various
parties to be between 56 and 66.
Gini indices, income distribution over
time for selected countries
US income Gini indices over time
Gini
indices for the United States at various times, according to the US Census Bureau:
 1929: 45.0 (estimated)
 1947: 37.6 (estimated)
 1967: 39.7 (first year reported)
 1968: 38.6 (lowest index reported)
 1970: 39.4
 1980: 40.3
 1990: 42.8
 2000: 46.2
 2005: 46.9
 2006: 47.0 (highest index reported)
 2007: 46.3
EU Gini index
In 2005 the Gini index for the EU was estimated at 31..
Advantages of Gini coefficient as a measure of inequality
 It can be used to compare income distributions across different
population sectors as well as countries, for example the Gini
coefficient for urban areas differs from that of rural areas in
many countries (though the United States' urban and rural Gini
coefficients are nearly identical).
 It is sufficiently simple that it can be compared across
countries and be easily interpreted. GDP statistics are often
criticized as they do not represent changes for the whole
population; the Gini coefficient demonstrates how income has
changed for poor and rich. If the Gini coefficient is rising as
well as GDP, poverty may not be improving for the majority of the
population.
 The Gini coefficient can be used to indicate how the
distribution of income has changed within a country over a period
of time, thus it is possible to see if inequality is increasing or
decreasing.
 The Gini coefficient satisfies four important principles:
 Anonymity: it does not matter who the high and low
earners are.
 Scale independence: the Gini coefficient does not
consider the size of the economy, the way it is measured, or
whether it is a rich or poor country on average.
 Population independence: it does not matter how large
the population of the country is.
 Transfer principle: if income (less than the
difference), is transferred from a rich person to a poor person the
resulting distribution is more equal.
Disadvantages of Gini coefficient as a measure of
inequality
 The Gini coefficient of different sets of people cannot be
averaged to obtain the Gini coefficient of all the people in the
sets: if a Gini coefficient were to be calculated for each person
it would always be zero. For a large, economically diverse country,
a much higher coefficient will be calculated for the country as a
whole than will be calculated for each of its regions. (The
coefficient is usually applied to measurable nominal income rather
than local purchasing power,
tending to increase the calculated coefficient across larger
areas.)
 For this reason, the scores calculated for individual countries
within the EU are difficult to
compare with the score of the entire US: the overall value for the
EU should be used in that case, 31.3, which is still much lower
than the United States', 45. Using decomposable inequality measures
(e.g. the Theil index T converted by
1{e^{T}} into a inequality coefficient) averts such
problems.
 The Lorenz curve may understate the actual amount of inequality
if richer households are able to use income more efficiently than
lower income households or vice versa. From another point of view,
measured inequality may be the result of more or less efficient use
of household incomes.
 Economies with similar incomes and Gini coefficients can still
have very different income distributions. This is because the
Lorenz curves can have different shapes and yet still yield the
same Gini coefficient.
 It measures current income rather than lifetime income. A
society in which everyone earned the same over a lifetime would
appear unequal because of people at different stages in their life;
a society in which students study rather than save can never have a
coefficient of 0. However, Gini coefficient can also be calculated
for any kind of distribution, e.g. for wealth.
Problems in using the Gini coefficient
 Gini coefficients do include investment income; however, the
Gini coefficient based on net income does not accurately reflect
differences in wealth  a possible source of misinterpretation.
For
example, Sweden has a low
Gini coefficient for income distribution but a significantly higher
Gini coefficient for wealth (still low by international standards,
but significantly higher than for income: for instance 77% of the
share value owned by households is held by just 5% of Swedish
shareholding households ). In other words, the Gini income
coefficient should not be interpreted as measuring effective
egalitarianism.
 Too often only the Gini coefficient is quoted without
describing the proportions of the quantiles used for measurement.
As with other inequality coefficients, the Gini coefficient is
influenced by the granularity of the measurements. For example,
five 20% quantiles (low granularity) will usually yield a lower
Gini coefficient than twenty 5% quantiles (high granularity) taken
from the same distribution. This is an often encountered problem
with measurements.
 Care should be taken in using the Gini coefficient as a measure
of egalitarianism, as it is properly
a measure of income dispersion. For example, if two equally
egalitarian countries pursue different immigration policies, the
country accepting higher proportion of lowincome or impoverished
migrants will be assessed as less equal (gain a higher Gini
coefficient).
 The Gini coefficient is a pointestimate of equality at a
certain time, hence it ignores lifespan changes in income.
Typically, increases in the proportion of young or old members of a
society will drive apparent changes in equality. Because of this,
factors such as age distribution within a population and mobility
within income classes can create the appearance of differential
equality when none exist taking into account epidemiological
effects. Thus a given economy may have a higher Gini coefficient at
any one point in time compared to another, while the Gini
coefficient calculated over individuals' lifetime income is
actually lower than the apparently more equal (at a given point in
time) economy's. Essentially, what matters is not just inequality
in any particular year, but the composition of the distribution
over time.
 Countries can have the same Gini coefficient but have
completely different levels of wealth. Similarly, the Gini
coefficient as measured over time does not measure growth in
incomes.
General problems of measurement
 Comparing income distributions among countries may be difficult
because benefits systems may differ. For example, some countries
give benefits in the form of money while others give food stamps, which might not be counted by some
economists and researchers as income in the Lorenz curve and
therefore not taken into account in the Gini coefficient. The USA
counts income before benefits, while France counts it after
benefits, making the USA appear slightly more unequal visavis
France than it admittedly is. In another example, USSR appeared to
have relatively high income inequality: by some estimates, in the
late 70's, Gini coefficient of its urban population was as high as
0.38, which is higher than many Western countries today. This
apparent inequality ignored the fact that many benefits received by
Soviet citizens were nonmonetary and were afforded regardless of
income: these benefits included, among others, free child care for
children as young as 2 months, free elementary, secondary and
higher education, free cradletograve medical care, free or
heavily subsidized housing. In this example, an accurate comparison
between the 1970s USSR and Western countries would require one to
assign monetary values to such benefits (a difficult task in the
absence of free markets). Similar problems arise whenever a
comparison between pure freemarket economies and partially
socialist economies is attempted. Benefits may take various and
unexpected forms: for example, major oil producers such as
Venezuela and Iran provide indirect benefits to its citizens by
subsidizing the retail price of gasoline.
 Similarly, in some societies people may have significant income
in other forms than money, for example through subsistence farming or bartering. Like nonmonetary benefits, the value of
these incomes is difficult to quantify. Different quantifications
of these incomes will yield different Gini coefficients.
 The measure will give different results when applied to
individuals instead of households. When different populations are
not measured with consistent definitions, comparison is not
meaningful.
 As for all statistics, there may be systematic and random
errors in the data. The meaning of the Gini coefficient decreases
as the data become less accurate. Also, countries may collect data
differently, making it difficult to compare statistics between
countries.
As one result of this criticism, in addition to or in competition
with the Gini coefficient
entropy measures are frequently
used (e.g. the
Theil Index and the
Atkinson index). These measures
attempt to compare the distribution of resources by intelligent
agents in the market with a maximum
entropy random distribution, which would occur
if these agents acted like nonintelligent particles in a closed
system following the laws of statistical physics.
Credit risk
The Gini coefficient is also commonly used for the measurement of
the discriminatory power of
rating
systems in
credit risk management.
The discriminatory power refers to a credit risk model's ability to
differentiate between defaulting and nondefaulting clients. The
above formula G_1 may be used for the final model and also at
individual model factor level, to quantify the discriminatory power
of individual factors. This is as a result of too many non
defaulting clients falling into the lower points scale e.g. factor
has a 10 point scale and 30% of non defaulting clients are being
assigned the lowest points available e.g. 0 or negative points.
This indicates that the factor is behaving in a counterintuitive
manner and would require further investigation at the model
development stage.
References: The Analytics of risk model validation
See also
References
 United Nations Development Programme
 Note that the calculation of the index for the United States
was changed in 1992, resulting in an upwards shift of about 2.

http://www.eurofound.europa.eu/areas/qualityoflife/eurlife/index.php?template=3&radioindic=158&idDomain=3
 Ray, Debraj. Development Economics. Princeton University Press,
1998. page 188].
 Friedman, David D.
 (Data from the Statistics Sweden.)
 N. Blomquist, "A comparison of distributions of annual and
lifetime income: Sweden around 1970", Review of Income and Wealth,
Volume 27 Issue 3, Pages 243  264, [1]
 "Politics, work, and daily life in the USSR", James R. Millar,
1987, p.193
Further reading
 Gini, Corrado (1912). "Variabilità e mutabilità" Reprinted in
Memorie di metodologica statistica (Ed. Pizetti E, Salvemini, T).
Rome: Libreria Eredi Virgilio Veschi (1955).
 The Chinese version of this paper appears in
External links
 Deutsche Bundesbank: Do banks diversify loan portfolios?, 2005 (on
using e.g. the Gini coefficient for risc evaluation of loan
portefolios)
 Forbes Article, In praise of inequality
 Gini index calculated for all countries (from
internet archive)
 Measuring Software Project Risk With The Gini
Coefficient, an application of the Gini coefficient to
software
 The World Bank: Measuring Inequality
 Travis Hale, University of Texas Inequality
Project:The Theoretical Basics of Popular Inequality Measures,
online computation of examples: 1A, 1B
 United States Census Bureau List of Gini
Coefficients by State for Families and Households
 Article from The Guardian analysing inequality in
the UK 1974  2006
 World Income Inequality Database
 Income Distribution and Poverty in OECD
Countries
 Software:
 A Matlab Inequality Package, including code for
computing Gini, Atkinson, Theil indexes and for plotting the Lorenz
Curve. Many examples are available.
 Free
Online Calculator computes the Gini Coefficient, plots the
Lorenz curve, and computes many other measures of concentration for
any dataset
 Free Calculator: Online and downloadable scripts (Python and Lua) for Atkinson, Gini, and Hoover
inequalities
 Users of the R data analysis software can install the " ineq" package which allows for computation of a
variety of inequality indices including Gini, Atkinson, Theil.