Hi,
I have top-coded, continuous CPS data on earnings. I want to impute the
mean income of this group of top-coded earners, making the assumption that
the upper-tail follows a pareto distribution. I'm wondering if anyone has
suggestions about how to do this in STATA (or even just generally how to
do it).
Some notes:
1.
The standard method of doing this typically involves imputing the mean of
top-coded earners given categories of earnings, using the following
formula:
Mean Income for top-coded category = X(V/V-1)
where:
X = topcode/open-ended category
V = c-d/b-a
where
a = log of lower limit of interval preceding top-coded/open-ended category
b = log of lower limit of top-coded/open-ended category
c = log of the sum of the frequencies in the top-coded category and the
category preceding it
d = log of the frequencies in the top-coded category
The problem with using this method given continuous earnings data (like
the CPS) is that the result is highly dependent on the choice one makes
about what interval to define as the "preceding category."
2.
Another method would use the mode and median to solve the equation:
median = mode * 2 (to the 1/V power)
(using the observed median and mode of the sample to calculate V and solve
the equation above)
The problem here is that when the median is less than the mode, it gives a
value for V less and 1, such that multiplying the top code gives a mean
for the top-coded income that is LESS than the top code, much to my
consternation.
Any help on this would be much appreciated!
Josh Guetzkow
Princeton University
Dept. of Sociology
Wallace Hall
Princeton, NJ 08544
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/