Dear listservers,
Austin presented a nice solution to my problem for model based
standardization with bootstrapped confdence intervals for the variable dp
(difference in probabilities under two exposure scenarios):
prog dp
cap drop pr0 pr1 dp
logit Y X r2 r3 a2 a3
ren X wasX
g X=0
predict pr0
replace X=1
predict pr1
drop X
ren wasX X
g dp=pr1-pr0
mean dp
End
bs: dp
--------------------------------------------------------------
| Observed Bootstrap Normal-based
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
dp | .087443 .0160805 .0559259 .1189601
--------------------------------------------------------------
However, the CI above is based on the assumption of normality, and also is
not "bias-corrected". I can get percentile-based and bias-corrected Cis by
adding:
estat bootstrap, all
----------------------------------------------------------------------------
--
| Observed Bootstrap
| Mean Bias Std. Err. [95% Conf. Interval]
-------------+--------------------------------------------------------------
--
dp | .087443 -.0047041 .01608046 .0559259 .1189601
(N)
| .0616029 .1161034
(P)
| .0633786 .1163289
(BC)
----------------------------------------------------------------------------
--
However, if I attempt to obtain bias corrected accelerated (Bca) CIs by
simply replacing the statement:
bs: dp
With
bs, bca: dp
It appears to run 444 Jackknife replications (my sample size is N=444) but
then I get the error message: "insufficient observations to compute
jackknife standard errors" r(2000) which means "You have requested some
statistical calculation and there are no observations on which to perform
it". This doesn't make sense to me since I already obtained bootstrap CIs on
these data.
Does anyone have any idea why this might be happening? Thanks, Garth
Garth Rauscher
Division of Epid/Bios (M/C 923)
UIC School of Public Health
1603 West Taylor Street
Chicago, IL 60612
ph: (312)413-4317
fx: (312)996-0064
em: [email protected]
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Austin Nichols
Sent: Monday, February 04, 2008 11:30 PM
To: [email protected]
Subject: Re: st: model-based standardization
Garth,
I haven't looked at the ref you cite, but what you want is one of the
several quantities that economists refer to as marginal effects (see e.g.
the description of -margfx- and SE calcs at
http://glue.umd.edu/~gelbach/ado/margfx.pdf).
Your code fails because you have not multiplied estimated coefficients by
variables.
Try instead e.g.
logit Y X r2 r3 a2 a3
g p0=invlogit(_b[_cons]+ _b[r2]*r2+_b[r3]*r3+_b[a2]*a2+_b[a3]*a3)
g p1=invlogit(_b[_cons]+_b[X]+_b[r2]*r2+_b[r3]*r3+_b[a2]*a2+_b[a3]*a3)
but it's easier to use predict in any case:
ren X wasX
g X=0
predict pr0
replace X=1
predict pr1
drop X
ren wasX X
though you still have to make sure there are no neglected connections
between X and other variables (e.g. interactions).
To bootstrap, simply wrap it in a program:
prog dp
cap drop pr0 pr1 dp
logit Y X r2 r3 a2 a3
ren X wasX
g X=0
predict pr0
replace X=1
predict pr1
drop X
ren wasX X
g dp=pr1-pr0
mean pr0 pr1 dp
end
bs: dp
On Feb 4, 2008 11:11 PM, Garth Rauscher <[email protected]> wrote:
> [I tried to send this message to the listserv a few days ago but don't
> think it made it through so I am trying again. I apologize if this is
> a duplicate message.]
>
> Dear listserve members
>
> I am attempting to learn how to perform a model-based standardization
> with Stata, using the marginal or predictive margins method. I would
> like to be able to estimate standardized probabilities and probability
> differences from logistic regression that are standardized to the
> distribution of modeled covariates. The idea is summarized in:
> "Greenland S. Model-based estimation of relative risks and other
> epidemiologic measures in studies of common outcomes and in
> case-control studies. Am J Epidemiol 2004;160:301-305." To the best of
> my understanding, the method involves estimating predictied
> probabilities of Y under two scenarios (e.g. x=1 and x=0). Assuming we
> have a dependent variable Y(0,1), an exposure of interest X(0,1), and
> covariates
> r2 r3 a2 a3 a4 a5, two sets of predicted probabiltiies could be:
>
> P0(x) based on the joint distribution of covariates, with X=0 assigned
> to everyone
> P1(x) based on the joint distribution of covariates, with X=1 assigned
> to everyone
> PD(x) as the difference in probabilities, P1(x) - P0(x)
>
> Below is my code.
> logit Y X r2 r3 a2 a3 a4 a5
> // predicted xbetas after assigning all observations to X=0 g
> if0=_b[_cons]+_b[X]*0+_b[r2]+_b[r3]+_b[a2]+_b[a3]+_b[a4]+_b[a5]
> // predicted xbetas after assigning all observations to X=1 g
> if1=_b[_cons]+_b[X]*1+_b[r2]+_b[r3]+_b[a2]+_b[a3]+_b[a4]+_b[a5]
> // predicted probabilities
> g p0x = invlogit(if0)
> g p1x = invlogit(if1)
> I was expecting two new variables of predictied probabilities, p0x and
> p1x with a range of values that depended on covariates. However, I
> noticed that p0x and p1x each had only one value instead of a range of
> values as I had expected (see above). Any clarification as to what I
> am doing incorrectly would be appreciated. I think my next task would
> have been to perform bootstrapping to get confidence intervals from
> the distribution of means for p0x, p1x and PD(x).
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/