|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: "if" statement
I'd appreciate if somebody could explain the following behavior of the "if" statement when used with "logistic" (I'm running STATA IC/10.1).
webuse nhanes2f
gen ageg=2 if age>=20 & age<30
replace ageg=3 if age>=30 & age<40
replace ageg=4 if age>=40 & age<50
replace ageg=5 if age>=50 & age<60
replace ageg=6 if age>=60 & age<70
replace ageg=7 if age>=70
replace sex=0 if sex==2
model 1 --> xi: logistic sex i.ageg
i.ageg _Iageg_2-7 (naturally coded; _Iageg_2 omitted)
Logistic regression Number of obs = 10337
LR chi2(5) = 2.80
Prob > chi2 = 0.7302
Log likelihood = -7150.626 Pseudo R2 = 0.0002
------------------------------------------------------------------------------
sex | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iageg_3 | .9761655 .0632659 -0.37 0.710 .8597191 1.108384
_Iageg_4 | .9971218 .0696639 -0.04 0.967 .8695188 1.143451
_Iageg_5 | .9424283 .065592 -0.85 0.394 .8222534 1.080167
_Iageg_6 | .9903392 .0554211 -0.17 0.862 .8874609 1.105144
_Iageg_7 | .8963705 .0683978 -1.43 0.152 .7718562 1.040971
------------------------------------------------------------------------------
model 2 --> xi: logistic sex i.ageg if age>=30
i.ageg _Iageg_2-7 (naturally coded; _Iageg_2 omitted)
note: _Iageg_4 dropped because of collinearity
Logistic regression Number of obs = 8017
LR chi2(4) = 2.35
Prob > chi2 = 0.6713
Log likelihood = -5544.1939 Pseudo R2 = 0.0002
------------------------------------------------------------------------------
sex | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iageg_3 | .9789833 .0734452 -0.28 0.777 .8451164 1.134055
_Iageg_5 | .9451487 .0748512 -0.71 0.476 .8092619 1.103853
_Iageg_6 | .9931979 .0670654 -0.10 0.919 .8700789 1.133739
_Iageg_7 | .8989579 .0765455 -1.25 0.211 .7607821 1.06223
------------------------------------------------------------------------------
Why is the age group 4 (40-49) dropped due to collinearity if there are 610 males and 660 females in this stratum? More worrisome, why is the age group 2 (20-29) still being used as reference when it should have been dropped as a consequence of the "if" statement (i.e. _Iage_3 should be the reference instead of _Iage_2)?
model 3 --> xi: logistic sex i.ageg if age<70
i.ageg _Iageg_2-7 (naturally coded; _Iageg_2 omitted)
note: _Iageg_7 dropped because of collinearity
Logistic regression Number of obs = 9352
LR chi2(4) = 0.86
Prob > chi2 = 0.9303
Log likelihood = -6472.0855 Pseudo R2 = 0.0001
------------------------------------------------------------------------------
sex | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iageg_3 | .9761655 .0632659 -0.37 0.710 .8597191 1.108384
_Iageg_4 | .9971218 .0696639 -0.04 0.967 .8695188 1.143451
_Iageg_5 | .9424283 .065592 -0.85 0.394 .8222534 1.080167
_Iageg_6 | .9903392 .0554211 -0.17 0.862 .8874609 1.105144
------------------------------------------------------------------------------
Now the "if" statement seems to work fine, as subjects with age>=70 are excluded (i.e. the _Iage_7 group has been dropped!)
This also occurs if I run these models using STATA IC/9.2 or if one models another dichotomous variable using a different dataset.
Many thanks,
VICTOR M. HERRERA MD. MS.
Research Assistant
Population Health Sciences Department
University of Wisconsin
610 Walnut St. 626 WARF
Madison, WI 53726
(608) 265-3686
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/