Great! Thank you,
Ricardo
--- "David W. Harless" <[email protected]> wrote:
> Ricardo Ovaldia wrote:
> > Dear all,
> >
> > I posted this under a different header and did not
> get
> > a reply. So let me ask the question better.
> >
> > What is the difference between conditional
> logistic
> > regression grouping on clinic and unconditional
> > logistic regression including clinic as a dummy
> > (indicator) variable? Tha is, what is the
> difference
> > in model assumptions and parameter estimates?
> >
> > Thank you,
> > Ricardo.
>
> The most important difference is that logit/logistic
> regression with dummy variables for
> groups is inconsistent unless the number of
> observations per group is large. There is a
> brief discussion of this (including cites) in the
> manual entry for -clogit- (page 224 of
> Reference A-J for Release 9 Manual).
>
> Way back when in February, 2000 Bill Gould and Vince
> Wiggins posted the note pasted below
> which gives a good explanation of these issues.
>
> Dave Harless
>
> > Jen Ireland <[email protected]> wrote,
> >
> >> > I am estimating a logit model in which I have
> clustered the observations
> >> > on the basis of a particular variable, not
> otherwise included in the
> >> > model, as I have reason to believe that the
> observations may not be
> >> > independent within the clusters.
> >> >
> >> > A colleague has argued that I could do just as
> well by simply including
> >> > the clustering variable as an explanatory
> variable in my model. Why is
> >> > it better to use clustering?
> >
> > Unless there is something very odd about Jen's
> problem about which he is not
> > telling us, I assume Jen's colleague is suggesting
> not that Jen simply include
> > the cluster variable as a single variable in his
> model, but that Jen include a
> > set of dummies for each value of the cluster
> variable.
> >
> > Assume I have data grouped into clusters and I
> label the clusters 1, 2, 3,
> > and so on. If I included the cluster variable as
> a single variable, I would
> > obtain a single coefficient for the cluster
> variable -- call it b -- and I
> > would be saying that the effect of being in the
> first cluster is b, the effect
> > of being in the second cluster is 2*b, and so on.
> >
> > But my labeling of the groups as cluster 1, 2, 3,
> is arbitrary, I assume. I
> > could just as well order the clusters, putting
> what is now cluster 3 into the
> > first postiion, cluster 1 in the second, and so
> on. Then I could call those
> > clusters 1, 2, 3 ..., and therein lies a problem.
> >
> > So I assume that the suggestion was to include a
> dummy variable for the
> > first cluster, another dummy variable for the
> second, and so on.
> >
> > Given that interpretation, and with respect, I
> must disagree with Jen's
> > colleague. To make a long story short (which long
> story I am about to tell),
> > Jen's colleague perhaps wished to suggest Jen use
> conditional logistic
> > regression (clogit) as an alterntive to -logit,
> cluster() robust-. Had he
> > said that, I would, in some cases, have agreed.
> >
> >
> > The basis of Jen's collegue's comment
> > -------------------------------------
> >
> > Rather than using the clustering correction to
> calculating the standard
> > errors, one could instead model the clustering.
> If one does that, and if one
> > has the modeling (meaning the assumptions) right
> -- one should be able to
> > produce more efficient estimates than those
> produced by -robust cluster()-.
> >
> > Within-cluster correlation can arise for any
> number of reasons, but one
> > particular reason is that each cluster has its own
> intercept. In that case,
> > one is tempted to estimate those intercepts by
> simply including the dummy
> > variables.
> >
> > That approach works in the case of linear
> regression, but it does not work in
> > general. Said technically, the asympotics are
> violated. Call the number of
> > clusters n and the average number of observations
> within cluster T, so that
> > the total number of obsrvations is N=n*T. As
> T->infinity, all is well. As
> > n->infinity, however, both the number of estimated
> parameters (coefficients on
> > the dummy variables) and the number of
> observations are going to infinity
> > together and only in strange cases does it work
> out that any of the estimated
> > parameters approach their true values.
> >
> > The strange case is linear regression and that
> occurs because it is linear
> > (although the reason is not transparent).
> >
> > In the case of logistic regression, however, the
> estimates one obtains from
> > including all the dummies are biased and, even as
> n->infinity, that bias never
> > goes away. Vince Wiggins <[email protected]> and
> I recently simulated this
> > and discovered that this not a sterile,
> theoretical argument -- the estimates
> > on obtains for the parameters are genuinely bad.
> >
> > To obtain good estimates, one must develop a new
> estimator. Models with
> > separate intercepts per cluster are known as
> "fixed-effects models". In the
> > case of logistic regression, this fixed-effects
> estimator is conditional
> > logistic regression.
> >
> > Thus, conditional logistic regression -- Stata's
> -clogit- command -- is an
> > alternative to using -robust cluster()-. In the
> case where the correlation
> > arises because of fixed effects (different
> intercepts across groups), -clogit-
> > is better is than -robust cluster()- because it
> produces more efficient
> > estimates, meaning more accurate estimates with
> smaller standard errors
> > and it is even better than that because there is
> now more going on in this
> > model than just correlation within cluster
> (namely, the possibility of
> > correlation of the fixed effects with other
> covariates) and -clogit- is
> > taking that into account, too.
> >
> > However, correlation within group can arise for a
> lot of reasons. Perhaps
> > the observations within groups are serially
> correlated, or perhaps two of the
> > observations are whoppingly correlated and, after
> that, there is not much
> > correlation at all, or perhaps the correlation
> structure differs across the
> > clusters. In that case, -clogit- will not produce
> correct standard errors.
> >
> > Meanwhile, -robust cluster()- will continue to
> produce correct standard errors
> > for it's inefficient but population-wise
> consistent estimates.
> >
> > -- Bill -- Vince
> > [email protected] [email protected]
> >
> *
> * For searches and help try:
> *
> http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
Ricardo Ovaldia, MS
Statistician
Oklahoma City, OK
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/