Since race is a single categorical variable at 3 levels, the acceptable
approach is to create your model on the total sample, with all
categories of race represented. Don't restrict your analysis to
subgroups, unless you have an a priori question or you've tested and
found significant interactions with race (e.g., race by age
interaction). This approach will preserve the integrity of the
statistical inquiry.
-p
______________________________________
Paul F. Visintainer, PhD
Department of Epidemiology and Biostatistics
School of Public Health
New York Medical College
PH: (914) 594-4804
FX: (914) 594-4853
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Ricardo
Ovaldia
Sent: Thursday, June 05, 2008 2:04 PM
To: [email protected]
Subject: Re: st: RE: Stratify analysis - logistic regression with
dummies
Thank you Paul. That makes perfect sense. However, the quetion of which
OR is best to report remains, especially if the means of the continuous
variable differ for ecah level of the class variable.
Ricardo Ovaldia, MS
Statistician
Oklahoma City, OK
--- On Thu, 6/5/08, Visintainer, Paul <[email protected]> wrote:
> From: Visintainer, Paul <[email protected]>
> Subject: st: RE: Stratify analysis - logistic regression with dummies
> To: [email protected]
> Date: Thursday, June 5, 2008, 9:49 AM
> Ricardo,
>
> The difference is probably due to the fact that you are
> developing your
> models on different samples sizes, and as a consequence, a
> different
> mean age for each sample. This isn't a problem when
> you are computing
> an unadjusted OR for a categorical variable. (Compare your
> unadjusted
> -logit- command with the equivalent -tabodds- command. In
> your example,
> your first logit command -xi: logistic low i.race- is
> computing the ORs
> for a 3x2 table. You can replicate the logit command by
> running
> -tabodds low race, or-).
>
> When you add age as a continuous variable to your model AND
> use the "if"
> statement, your model is alternatively excluding
> observations who are
> either RACE2 or RACE3, thus your sample size changes (e.g,
> n=122 or
> n=163). The adjustment for age is based on the mean age
> for sample
> being used to estimate the OR. Thus, as you change the
> samples change
> so does the mean age:
>
> For n=189: mean age== 23.2381
> For n=163: mean age== 23.5092 (no RACE2)
> For n=122: mean age== 23.7049 (no RACE3)
>
> -p
>
>
> ______________________________________
> Paul F. Visintainer, PhD
> Department of Epidemiology and Biostatistics
> School of Public Health
> New York Medical College
> PH: (914) 594-4804
> FX: (914) 594-4853
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Ricardo
> Ovaldia
> Sent: Wednesday, June 04, 2008 10:15 AM
> To: [email protected]
> Subject: st: Stratify analysis - logistic regression with
> dummies
>
> I am confused by some of the result that I got. I will
> illustrate using
> Hosmer & Lemeshow' low weight data:
>
> . use http://www.stata-press.com/data/r10/lbw.dta
> (Hosmer & Lemeshow data)
>
> if I fit
>
> . xi:logistic low i.race
>
> and then fit
>
> . xi:logistic low i.race if race==1 | race==2
>
> and
>
> . xi:logistic low i.race if race==1 | race==3
>
> I get the same OR for _Irace_2 and _Irace_3 as I do for
> the full
> model. This is as expected because the dummies are
> ortogonal to each
> other.
>
> However, when a covariate is added to the model, the same
> is not true
> anymore:
>
>
> . xi:logistic low i.race age
>
> low | Odds Ratio Std. Err. z P>|z|
> [95% Conf.
> Interval]
>
-------------+----------------------------------------------------------
> ---
> _Irace_2 | 2.106974 .9932407 1.58 0.114
> .8363679
> 5.307878
> _Irace_3 | 1.767748 .6229325 1.62 0.106
> .8860686
> 3.526738
> age | .9612592 .0311206 -1.22 0.222
> .9021588
> 1.024231
>
------------------------------------------------------------------------
> ---
>
> . xi:logistic low i.race age if race==1 | race==2
>
>
------------------------------------------------------------------------
> ---
> low | Odds Ratio Std. Err. z P>|z|
> [95% Conf.
> Interval]
>
-------------+----------------------------------------------------------
> ---
> _Irace_2 | 2.155207 1.021287 1.62 0.105
> .8513944
> 5.45566
> age | .9705512 .0376446 -0.77 0.441
> .8995039
> 1.04721
>
------------------------------------------------------------------------
> ---
>
> . xi:logistic low i.race age if race==1 | race==3
>
>
------------------------------------------------------------------------
> ---
> low | Odds Ratio Std. Err. z P>|z|
> [95% Conf.
> Interval]
>
-------------+----------------------------------------------------------
> ---
> _Irace_3 | 1.724551 .6098827 1.54 0.123
> .8622856
> 3.449063
> age | .9440875 .0340586 -1.59 0.111
> .8796392
> 1.013258
>
------------------------------------------------------------------------
> ---
>
>
> There is no missing data.
>
>
> I am very confused about which OR to reports and what are
> the
> differences in these models. I was not expecting these
> results.
>
> Thank you in advance,
> Ricardo.
>
>
> Ricardo Ovaldia, MS
> Statistician
> Oklahoma City, OK
>
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/