I can't answer your deeper question about nested models.
The simpler question here is about the decision rule to drop variables
during the stepwise procedure. Stata is using precisely the decision
rule you specified in your command, -pr(0.2)-. That is the significance
level for removal, as shown in action in your output.
If you specify robust standard errors, what does and does not satisfy
this rule may well change, as with different standard errors different
significance levels will be calculated, but again you get what you ask
for.
On what you should do, that depends on how seriously you take the advice
of Frank Harrell and others that stepwise methods are generally a bad
idea. (Google for sources.)
Similarly, every expert has a different way to balance parsiomony and
goodness of fit, and I would not want to try to add another.
Nick
[email protected]
John LeBlanc, reporting a query from Magda Szumilas
I'm a graduate student who is new to Stata. For my thesis, I'm trying to
figure out how I can test nested models when I'm forced to use robust
standard errors. Stata tells me that I can't use lrtest and I understand
that, since it depends on maximum likelihood estimates. So what does one
use?
Here's what I did. Having done an initial backwards stepwise logistic
regression at pr(0.2), I would like to manually create a parsimonious
model with the best possible fit. I assume that Stata is using some
decision rule to drop variables during the stepwise procedure; is this
what I should use when I try to drop them manually? What is Stata's
decision rule for stepwise logistic regression using robust standard
errors?
I found nothing in the manual and nothing helpful after extensive
searching on the web.
**************************************
An example below:
. xi: sw logistic usemh3 i.grade sexorcat markcat partcat livecat
edumomcat edudadcat sexriskcat anysmoke if sex==1, cluster(site) pr(0.2)
i.grade _Igrade_10-12 (naturally coded; _Igrade_10
omitted)
begin with full model
p = 0.6664 >= 0.2000 removing markcat
p = 0.6006 >= 0.2000 removing edumomcat
p = 0.5856 >= 0.2000 removing _Igrade_12
p = 0.2054 >= 0.2000 removing sexorcat
p = 0.2113 >= 0.2000 removing _Igrade_11
p = 0.2592 >= 0.2000 removing partcat
Logistic regression Number of obs =
580
Wald chi2(1) =
.
Prob > chi2 =
.
Log pseudolikelihood = -266.26595 Pseudo R2 =
0.0691
(Std. Err. adjusted for 3 clusters in
site)
------------------------------------------------------------------------
------
| Robust
usemh3 | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
livecat | .5896426 .1786052 -1.74 0.081 .325654
1.067631
edudadcat | 1.602875 .1808557 4.18 0.000 1.284863
1.999597
sexriskcat | .4266733 .0246379 -14.75 0.000 .3810162
.4778014
anysmoke | 2.502815 .266854 8.60 0.000 2.030824
3.084503
------------------------------------------------------------------------
------
. estimates store full
. xi: sw logistic usemh3 i.grade sexorcat markcat partcat livecat
edumomcat edudadcat anysmoke if sex==1, cluster(site) pr(0.2)
i.grade _Igrade_10-12 (naturally coded; _Igrade_10
omitted)
begin with full model
p = 0.6856 >= 0.2000 removing markcat
p = 0.5475 >= 0.2000 removing _Igrade_12
p = 0.2756 >= 0.2000 removing sexorcat
p = 0.2803 >= 0.2000 removing partcat
p = 0.2756 >= 0.2000 removing _Igrade_11
Logistic regression Number of obs =
600
Wald chi2(1) =
.
Prob > chi2 =
.
Log pseudolikelihood = -284.2349 Pseudo R2 =
0.0489
(Std. Err. adjusted for 3 clusters in
site)
------------------------------------------------------------------------
------
| Robust
usemh3 | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
livecat | .7364448 .1749249 -1.29 0.198 .4623359
1.173067
edudadcat | 1.366027 .2559876 1.66 0.096 .9461231
1.97229
edumomcat | 1.35079 .278478 1.46 0.145 .901788
2.02335
anysmoke | 2.571288 .1562286 15.54 0.000 2.282615
2.896468
------------------------------------------------------------------------
------
. lrtest full
LR test likely invalid for models with robust vce
r(498);
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/