Ricardo,
The difference is probably due to the fact that you are developing your
models on different samples sizes, and as a consequence, a different
mean age for each sample. This isn't a problem when you are computing
an unadjusted OR for a categorical variable. (Compare your unadjusted
-logit- command with the equivalent -tabodds- command. In your example,
your first logit command -xi: logistic low i.race- is computing the ORs
for a 3x2 table. You can replicate the logit command by running
-tabodds low race, or-).
When you add age as a continuous variable to your model AND use the "if"
statement, your model is alternatively excluding observations who are
either RACE2 or RACE3, thus your sample size changes (e.g, n=122 or
n=163). The adjustment for age is based on the mean age for sample
being used to estimate the OR. Thus, as you change the samples change
so does the mean age:
For n=189: mean age== 23.2381
For n=163: mean age== 23.5092 (no RACE2)
For n=122: mean age== 23.7049 (no RACE3)
-p
______________________________________
Paul F. Visintainer, PhD
Department of Epidemiology and Biostatistics
School of Public Health
New York Medical College
PH: (914) 594-4804
FX: (914) 594-4853
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Ricardo
Ovaldia
Sent: Wednesday, June 04, 2008 10:15 AM
To: [email protected]
Subject: st: Stratify analysis - logistic regression with dummies
I am confused by some of the result that I got. I will illustrate using
Hosmer & Lemeshow' low weight data:
. use http://www.stata-press.com/data/r10/lbw.dta
(Hosmer & Lemeshow data)
if I fit
. xi:logistic low i.race
and then fit
. xi:logistic low i.race if race==1 | race==2
and
. xi:logistic low i.race if race==1 | race==3
I get the same OR for _Irace_2 and _Irace_3 as I do for the full
model. This is as expected because the dummies are ortogonal to each
other.
However, when a covariate is added to the model, the same is not true
anymore:
. xi:logistic low i.race age
low | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
---
_Irace_2 | 2.106974 .9932407 1.58 0.114 .8363679
5.307878
_Irace_3 | 1.767748 .6229325 1.62 0.106 .8860686
3.526738
age | .9612592 .0311206 -1.22 0.222 .9021588
1.024231
------------------------------------------------------------------------
---
. xi:logistic low i.race age if race==1 | race==2
------------------------------------------------------------------------
---
low | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
---
_Irace_2 | 2.155207 1.021287 1.62 0.105 .8513944
5.45566
age | .9705512 .0376446 -0.77 0.441 .8995039
1.04721
------------------------------------------------------------------------
---
. xi:logistic low i.race age if race==1 | race==3
------------------------------------------------------------------------
---
low | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
---
_Irace_3 | 1.724551 .6098827 1.54 0.123 .8622856
3.449063
age | .9440875 .0340586 -1.59 0.111 .8796392
1.013258
------------------------------------------------------------------------
---
There is no missing data.
I am very confused about which OR to reports and what are the
differences in these models. I was not expecting these results.
Thank you in advance,
Ricardo.
Ricardo Ovaldia, MS
Statistician
Oklahoma City, OK
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/