Last Monday I posted a query to Statalist regarding how to code my household survey data for various combinations of (1) an income mean-preserving contracting in inequality and (2) an increase in mean income while leaving distribution unchanged, with various permutations/mixtures in between. I received extremely helpful replies from Austin Nichols, Stas Kolenikov, and Stephen Jenkins; and have since been following up on the literature and other useful suggestions they made. I am also trying to use a modified version of the basic coding which Austin suggested (reproduced below); while having taken note of the caveats and cautions mentioned I think it will suffice for my purposes (which does not include submission to peer-reviewed journals). However I have 3 follow-up questions on the coding which I have still not been able to figure out on my own:
1) What is the reason for generating the variable x, and especially for doing this for only about half of the observations (specified as "in 1/300" whereas there are almost 600 observations in the psidextract dataset once the command "keep if t==7" has been implemented). I couldn't figure out why x is needed and is included in the subsequent kdensity and line commands. I was thinking of just dropping this from my modifed version of the code, but was afraid that it might be very important and I would be making a big mistake in dropping this aspect.
2) Sorry if this is a stupid question, but is there any particularly reason to work in logs rather than original values, for this type of work?
3) Perhaps my most important question: The code to generate lw2 actually seems to be MEDIAN-preserving and not mean-preserving (the relevant parts of the code for this are "g add=-((_n-1)*2/594-1)*.2; g lw2=add+lwage"). The "add" variable is 0 for the median person and hence lw2 is unchanged for that person, but raised for lower income earners and lowered for higher income earners, and the mean does actually change, unlike what is needed. I have tried to come up with a different method for preserving the mean (while reducing spread) but without success. What I need to do here is equivalent to a "stretch" as in the 3 S's of [(Stephen P. Jenkins and Philippe Van Kerm) "Accounting for income distribution trends: A density function decomposition approach" in Journal of Economic Inequality, 3(1), April 2005, 43-61] but in 'reverse' as I want to reduce not increase spread.
(I believe that after doing this simple exercise I should be able to work 'backwards' using the -gidecomposition- command to decompose the resultant change in poverty into the growth, redistribution, and residual/interaction components; and with the new mean-preserving reduced-spread distribution the growth component should be 0 and the redistribution component should account for the entire reduction in poverty, similarly with the new distribution-preserving higher-mean distribution the growth component from -gidecomposition- should explain the entire resultant change in poverty.)
Austin's simple suggested code (which he noted was just a silly example with important caveats):
webuse psidextract, clear
keep if t==7
g x=_n/100+5.6 in 1/300
kdensity lwage, at(x) g(d0) nogr
g lw1=lwage+.2
kdensity lw1, at(x) g(d1) nogr
sort lwage
g add=-((_n-1)*2/594-1)*.2
g lw2=add+lwage
kdensity lw2, at(x) g(d2) nogr
line d0 d1 d2 x, sort xli(6..2)
su lw*
Sorry to write again bothering people about the same issue but I have been unable to figure this out on my own, I think I won't have any further questions on this issue once I get this sorted.
best,
Lola Jackson
----- Original Message ----
From: Stephen P.. Jenkins <[email protected]>
To: [email protected]
Sent: Tuesday, 22 July, 2008 11:55:21 AM
Subject: st: poverty/inequality analysis
=====================================================
Date: Mon, 21 Jul 2008 15:27:02 -0400
From: "Austin Nichols" <[email protected]>
Subject: Re: st: poverty/inequality analysis
Lola and Stas:
Given Lola's reference to survey data, I assumed she wanted to work
with real income distributions, which are not lognormal (unfortunately
for us programmers). Here's a silly example reducing the "poverty"
rate (poverty line at 6.2 for no good reason) from 5% to 2% with
either an increase in mean or a decrease in dispersion, holding the
other constant:
webuse psidextract, clear
keep if t==7
g x=_n/100+5.6 in 1/300
kdensity lwage, at(x) g(d0) nogr
g lw1=lwage+.2
kdensity lw1, at(x) g(d1) nogr
sort lwage
g add=-((_n-1)*2/594-1)*.2
g lw2=add+lwage
kdensity lw2, at(x) g(d2) nogr
line d0 d1 d2 x, sort xli(6.2)
su lw*
Note that the mean-preserving decrease in dispersion I used does
generate some reranking. It so happens the same 12 people are poor
under either transformation, but YMMV.
No idea is that's the kind of thing Lola has in mind or not...
Lola--you may also want to read (for conceptual background)
"Trends in income inequality, pro-poor income growth, and income
mobility"
by Stephen P. Jenkins and Philippe Van Kerm in
Oxford Economic Papers 2006 58(3):531-548.
=====================================================
>>>>>>>>>
Thanks for the plug, Austin. However, a paper that is perhaps more
closely related to Lola's needs is
"Accounting for income distribution trends: a density function
decomposition approach", Journal of Economic Inequality, 3(1), April
2005, 43-61 (Stephen P. Jenkins and Philippe Van Kerm)
"Abstract. This paper develops methods for decomposing changes in the
income distribution using
subgroup decompositions of the income density function. Overall
changes are related to changes in
subgroup shares and changes in subgroup densities, where the latter
are broken down further using
elementary transformations of individual incomes. These density
decompositions are analogous
to the widely-used decompositions of inequality indices by population
subgroup, except that they
summarize multiple features of the income distribution (using graphs),
rather than focusing on a
specific feature such as dispersion, and are not dependent on the
choice of a specific summary index.
Nonetheless, since inequality and poverty indices can be expressed as
PDF functionals, our density-based
methods can also be used to provide numerical decompositions of these.
An application of the
methods reveals the multi-faceted nature of UK income distribution
trends during the 1980s."
We decompose densities using a variation on the DiNardo-Fortin-Lemieux
idea -- using elementary transformations to explore the impacts of
changes in location, spread, and other distributional features --
what we call the three `S's of distributional change:
* sliding: a ceteris paribus shift of the PDF along the income line;
* stretching: a ceteris paribus increase in spread around a constant
mean; and
* squashing: a ceteris paribus disproportionate increase in density
mass on one side of the mode.
Stephen
-------------------------------------------------------------
Professor Stephen P. Jenkins <[email protected]>
Director, Institute for Social and Economic Research
University of Essex, Colchester CO4 3SQ, U.K.
Tel: +44 1206 873374. Fax: +44 1206 873151.
http://www.iser.essex.ac.uk
Survival Analysis using Stata:
http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/
Downloadable papers and software: http://ideas.repec.org/e/pje7.html
Learn about the UK's new household panel survey, the United Kingdom
Household Longitudinal Study: http://www.iser.essex..ac.uk/ukhls/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
__________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/