Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Austin Nichols <austinnichols@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Interpretation of Oaxaca decomposition results after re-transformation of log scale |
Date | Wed, 12 Mar 2014 07:40:22 -0400 |
Vaidyanathan Ganapathy <vaidyang@usc.edu>: This looks like a mistake--I wonder if -oaxaca- (SSC, SJ) should even support the eform option, or issue a warning at least. You want a -glm- with log link, not a regression of ln(y) on X, per http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ http://www.stata.com/meeting/boston10/boston10_nichols.pdf which is not currently supported by -oaxaca- AFAIK ...but could be, with a decomp of first moments as in the Yun (2008) approach for logit/probit used by -oaxaca- since 2010: http://www.stata.com/statalist/archive/2010-01/msg00042.html Your current decomp has explained 57% and unexplained 70% for a total>100. On another topic altogether, I wonder if your sample includes stillbirths, or only infants surviving to some specified age. On Wed, Mar 12, 2014 at 3:03 AM, Vaidyanathan Ganapathy <vaidyang@usc.edu> wrote: > Dear Statalisters, > > I am performing an Oaxaca type decomposition to understand the > healthcare cost differences between two groups - controls and > premature infants. Here is my specification: > > . oaxaca lnallhccx2 tpcat2-tpcat6 bpdx2 chdx2 asthbrdx2 resinfxdx2 > cnsdx2 motordx2 physdevdx2 nddx2 chrnic1 period, by(premie_cat) pooled > vce(cluster pcn) eform > > The dependent variable is ln(healthcare costs) and the other variables > are covariates including poverty levels (tpcat2-tpcat6) and certain > medical diagnoses. Since the dependent variable is in log scale I used > the -eform option to exponentiate and report the predicted costs and > the decomposed cost differentials. While I am able to interpret the > predicted values for the two groups, I have some trouble in > interpreting the overall, explained and unexplained differences. Here > is the output - > > > > Blinder-Oaxaca decomposition Number of obs = 137972 > > 1: premie_cat = 0 (controls) > 2: premie_cat = 1 (premature infants) > > (Std. Err. adjusted for 68994 clusters in pcn) > ------------------------------------------------------------------------------- > | Robust > lnallhccx2 | exp(b) Std. Err. z P>|z| [95% Conf. Interval] > --------------+---------------------------------------------------------------- > Differential | > Prediction_1 | 348.9868 1.737476 1176.03 0.000 345.598 352.4089 > Prediction_2 | 956.743 75.23525 87.28 0.000 820.0862 1116.172 > Difference | .3647655 .0287414 -12.80 0.000 .3125676 .4256803 > --------------+---------------------------------------------------------------- > Decomposition | > Explained | .5206098 .035001 -9.71 0.000 .4563368 .5939355 > Unexplained | .7006504 .0489067 -5.10 0.000 .6110629 .8033722 > ------------------------------------------------------------------------------- > > Using simple math, it could be seen from the results in panel 1 > (Differential) that healthcare costs among controls is only 36.47% of > that of healthcare costs among premature infants. This led me to the > following interpretation about the overall cost differential between > premature and control infants: The healthcare cost among premature > infants increases by 174% of that of the costs among controls as > predicted by the group models. Is it correct to make this > interpretation? > > The interpretation of the decomposition results (panel 2 above) - the > explained and unexplained components, doesn't seem to be that straight > forward. > > Any help in understanding these difference estimates will be very helpful. > > Thanks!/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/