Thank you Dr. Gould for a thorough and clear
explanation.
I have a similar problem related to conditional
logistic regression. I have data from a multi-center
(7 clinics) study. I analyzed the data using
conditional logistic grouping on clinic. I was asked
to defend my method, because previous analyses on
these data were performed using indicator variables or
simply using a robust variance estimator.
I am planning on using the explanation from Dr. Gould
post, however, the argument that I would use for
conditional logistic is the same as that presented for
the indicator variables (dummies) . So I am missing
something, what is the difference? By the way, the
results I obtained using conditional logistic and
dummies are very similar.
Thank you,
Ricardo
--- "William Gould, StataCorp LP" <[email protected]>
wrote:
> Daniel Koralek <[email protected]> writes about using
> -stcox- on individual
> data where each individual was recruited from one of
> ten centers. He is
> concerned that which center may influence survival
> because "different foods
> eaten in different regions may influence nutrients".
>
> He considers three ways of dealing with this
> problem,
>
> . stcox ..., vce(cluster center)
> (1)
>
> . xi: stcox ... i.center
> (2)
>
> . stcox ..., stratify(center)
> (3)
>
> and, of course, he could ignore center altogether
>
> . stcox ... [center completely omitted]
> (0)
>
> As a matter of notation, let's assume the other
> covariates in the
> models (the ... part) are x1 and x2.
>
> My comments are as follows:
>
> Re solution (0):
>
> This solution assumes center has no effect and
> Daniel has already
> raised concerns that it does, so the solution
> is inappropriate.
>
> Re solution (1):
>
> This solution also assumes center has no
> effect; it instead
> conservatively handles the situation where the
> individual patients
> are overly homogeneous, which is to say, not
> independent draws.
> Actually, I didn't say that exactly right for
> the Cox model, but
> what I said implies what what I should have
> said, which is that
> selection of the failures from the risk pools
> at each failure time
> are not independent.
>
> Daniel tried solution (1) and found that the
> standard errors changed,
> but the reported coefficients did not.
> Exactly. Under solution (1),
> because center has no effect, the coefficients
> estimated the standard
> way are fine, although perhaps inefficient.
> The lack of independence,
> however, means standard errors usually will be
> understated and
> -vce(cluster center)- handles that.
>
> Re solution (2):
>
> This solution assumes that center does have a
> direct effect on
> survival, and it constrains the effect to be a
> multiplicative
> shift in the the baseline hazard function. The
> baseline hazard
> function ho(t) is a function of time, such as
>
> ho(t)
> | .
> | . . .
> |. . .
> | . .
> | . .
> |
> +------------------- time
>
> FYI, the baseline survival function So(t) is
> the integral of
> ho(t), negated and exponentiated. There's
> nothing deep there;
> that's just the mathematical formula for
> calculating one one
> from the other. I switchd to hazard
> functions, however,
> because the hazard function is the natural
> metric for the Cox model.
> The hazard rate for a particular individual in
> the data at a particular
> time is just ho(t)*exp(X_i*b), where X_i are
> the individual's covariates
> at time t. That's why I said solution (2)
> constrains each center's
> effect to be a multiplicative shift of ho(t).
>
> Concerning our use of dummy variables for the
> centers,
> we would like to think that we chose this
> particular functional form
> because it is truly representative of how the
> different
> foods served in the different centers
> influence the hazard, but
> the fact is that we choose this functional
> form because it is
> convenient; the effect of each center is
> wrapped up in just a
> single coefficient.
>
> This is not a bad approach.
>
> Re solution (2.5):
>
> Alright, I admit that Daniel did not include a
> solution (2.5), but
> I want to add it; it will help to understand
> solution (2), and
> is often useful in and of itself.
>
> Solution (2) was
>
> . xi: stcox ... i.center
> (2)
>
> Solution 2.5 is
>
> . xi: stcox ... i.center i.center*x1
> (2.5)
>
> In this solution, we assume that center does
> not merely shift
> the hazard function in a multiplicative way,
> we assume that
> center modifies the effect of x1.
>
> Actually, there are a lot of solution (2.5)'s.
> I could have chosen
> x2 rather than x1,
>
> . xi: stcox ... i.center i.center*x2
>
> or even x1 and x2,
>
> . xi: stcox ... i.center i.center*x1
> i.center*x2
>
> Anyway, in this modeling-based approach, we
> need to think carefully
> about how the different foods served in the
> centers effects the shifting
> of the baseline hazard function. Is it just a
> shift (solution 2),
> or do the different foods modify the effect x1
> (solution 2.5), or
> something else?
>
> We also need to appreciate that we are assuming
> the SHAPE of the
> survivor function is the same across all
> centers and that we are
> just moving it up and down, multiplicatively.
>
>
> Re solution (3):
>
> In this solution, we let the baseline hazard be
> different for each
> center. That is, rather than assuming the
> baseline function is
>
> ho(t)
> | .
> | . . .
> |. . .
> | . .
> | . .
> |
> +------------------- time
>
> for all centers, albeit shifted, we assume
> that above picture might
> be the baseline function for center 1, and for
> center 2, the function
> could be completely different:
>
> ho(t)
> | . . .
> | . .
> |. . .
> | . .
> | . . .
> |
> +------------------- time
>
=== message truncated ===
Ricardo Ovaldia, MS
Statistician
Oklahoma City, OK
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/