Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | <S.Jenkins@lse.ac.uk> |
To | <S.Jenkins@lse.ac.uk> |
Subject | st: Computing the Gini or another inequality coefficient from a limited number of data points |
Date | Tue, 14 Feb 2012 14:26:26 -0000 |
[Reposted because, unbeknown to me, my institution's webmailer sent non-ascii text plus a nasty winmail.dat file. Subject: line more informative too.] ------------------------------ Date: Fri, 10 Feb 2012 09:52:02 +0100 From: Jen Zhen <jenzhen99@gmail.com> Subject: st: Computing the Gini or another inequality coefficient from a limited number of data points Dear list members, I would like to compute a measure of income inequality similar to the Gini index. I do not know everyone's income, so need to make an approximation. (1) For the 5 most recent years, I know for 6 income brackets how many individuals there are and their joint income, hence also the average income in the bracket. For the full-fledged Gini index I would need to know the area under the curve which shows the cumulative income against the cumulative number of tax payers (to visualize what I mean, look e.g. at the 2nd figure here: http://en.wikipedia.org/wiki/Gini_index). Now I believe that with the information I have I don't know the entire curve but I know only 7 points on it (the six points mentioned plus the origin). So I think I can approximate the said area if I simply assume that between the 7 points the line is straight, but that will systematically underestimate the true degree of inequality. So I'm wondering if there is a sensible way to smooth the curve and hence get a better approximation? (2) For the 5 earliest years unfortunately I know only the number of individuals in each bracket but not their joint income. So my idea was that I would regress the mean income in each bracket on a 3rd-order function in the year to see how it develops in the 5 latest years and use this to predict/estimate the mean income for each bracket in the 5 earlier years, then use the procedure described in (1). A simpler alternative would be to just use the midpoint of each bracket, but I guess this would be less good. Does this procedure sound sensible? Or is there a better way to compute inequality from these data? Thank you so much and best regards, JZ ------------------------------- You have received some useful suggestions. However, also note that there is a well-established literature on non-parametric approaches to estimation of inequality indices from data that are in grouped form. See e.g. FA Cowell and F Mehta "The estimation and interpolation of inequality measures", Review of Economic Studies, 49(2), April 1982, 273-290. And references therein. They also refer to derivation of upper and lower bounds. With the information about your income distribution that you have (much grouping, but at least means within categories ... but what in the top unbounded range?), the bounds on your Ginis may be quite wide. Stephen ----------------------------------- Professor Stephen P. Jenkins Department of Social Policy and STICERD/CASE London School of Economics and Political Science Houghton Street, London WC2A 2AE, UK Tel: +44(0) 207 955 6527 Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP 2011, http://ukcatalogue.oup.com/product/9780199226436.do Survival Analysis Using Stata: http://www.iser.essex.ac.uk/survival-analysis Downloadable papers and software: http://ideas.repec.org/e/pje7.html Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/