Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: "Separation" issue in clustered/Longitudinal binary data.
From
Maarten buis <[email protected]>
To
[email protected]
Subject
Re: st: "Separation" issue in clustered/Longitudinal binary data.
Date
Wed, 22 Dec 2010 10:23:21 +0000 (GMT)
--- On Wed, 22/12/10, [email protected] asked:
> > The outcome variable is a binary variable (a patient
> > reported drug's side effect) with repeated measures for
> > three waves. Now I have an intervention (whether the
> > participant received the drug). <snip>
--- On Wed, 22/12/10, Maarten buis answered:
> I may be missing something obvious, but don't you need to
> use the drug in order to experience its side-effects. <snip>
> If something like that is happening in your data, then it is
> hard to see how an "effect" of your treatment could have a
> meaningful substantive interpretation.
To expand a bit on this answer: The problems with seperation
are a logical consequence of how we define effects in "logit-
like-models". The effect is a ratio of odds. Consider the
example below:
*--------------- begin example ------------------
// get some data and prepare it
sysuse auto, clear
gen byte good = rep78 > 3 if rep78 < .
gen byte baseline = 1
// estimate a logistic regression
logit good i.foreign baseline, or nocons
*---------------- end example ---------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )
The number reported for baseline is the baseline odds,
the number of successes per failure for someone (in this
case somecar) who has the value 0 on all covariates. So
for a domestic (=US) car we expect to to find .297 cars
with a good repair record for every car with a bad repair
record. The effect of foreign tells us that the odds of
having a good repair record is 20.18 times larger for
foreign cars than domestic cars.
It is also instructive to look at the individual odds.
In the example below we did not leave the variable for
the reference category out of the model, but instead
excluded the constant.
*---------------- begin example -------------------
// get the odds for foreign and domestic cars
logit good ibn.foreign, nocons or
// odds ratio is a well chosen name for this statistic,
// as it is literaly a ratio of odds
di exp(_b[1.foreign])/exp(_b[0.foreign])
*----------------- end example --------------------
Here we see that as before the odds of having a good
repair record is .297 good cars for every bad car. We
can now also see that the odds of having a good repair
record is 6 good cars for every bad car. The odds ratio
we found in the first example is thus literally the
ratio of these odds.
In your case your baseline odds is 0: for patient who
have not been given the drug there are 0 patients who
experience the side-effects for every patient who did
not experience the side-effects. How many times larger
is the odds of experiencing the side-effects if the
baseline is 0? There is just no answer to that question.
You can also see that by noticing that the odds ratio is
in that case some number divided by 0, which is undefined.
As I understand it, what commands like -firthlogit- do is
assume that the baseline odds isn't really 0 in the
population, but that the odds is so small that just
because of randomness your sample by accident did not find
any successes in your baseline group. However, if the
baseline odds is truely 0, as is in your case probably by
definition the case, than these methods can not help. You
can run these programs, but the results just don't mean
anything.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/