Hello,
I have individual-level data on grouped income, where an individual's income
may be in one of 31 bands. The bands get wider at higher incomes (the
highest being open-ended). I know the cut points for each band and the
number of observations/individuals within each band. I would like to compute
a continuous individual-level income variable from this. The most common
method seems to be to take the midpoint of the band and assign this value to
all individuals in the band, making some reasonably arbitrary assumption as
to the value to be used for the highest (unbounded) group. It seems to me
this is very simplistic. I use Stata 7 and in the past I have used interval
regression to regress banded income on a number of explanatory variables and
then used the predictions from this model as a measure of predicted income.
Invariably this methods leads to some observations having a predicted income
value outside the original band, which is not very satisfactory.
I have been looking at the possibility of using some kind of kernel density
estimation, setting the widths equal to the income bands. However, as far as
I can see, in Stata 7's hardwired -kdensity- command it is not possible to
set different widths across the distribution (my income bands are not equal
widths). Also, I am not sure how then to apply the information on the
resulting density estimates to generate values for observations within each
band.
Any thoughts on how to proceed are much appreciated. I guess this must be a
common problem and applies to a whole range of variables, not just income.
Apologies in advance if I have missed something obvious.
Thankyou.
Steve Morris
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/