|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: question about cluster() option in regression
--
Thus there is nothing "improper" about the cluster option. The
difference in estimated coefficients is due to a difference in how
observations are weighted. In -reg- with or without a cluster option,
all observations get the same weight, unless you specify otherwise.
Multi-level models, on the other hand, estimate variances of the
random effects and use the information to give observations different
weights.
Contrary to your impression, this phenomenon is not "rare" in
statistical software. In SAS, for example, you will find the same
difference between estimates from: 1) the GENMOD and SURVEYREG
programs, which have equivalent cluster options, and 2) the multi-
level MIXED program.
The main difference between the two approaches, cluster or
multilevel, is that the cluster() option provides model-free standard
errors. The multi-level programs require a correct model for the
variance structure, for example that standard deviations are constant
at each level. If the model for the variance structure is correct,
estimates from multi-level programs will be more efficient, and
standard errors will be more precisely estimated. In Stata and SAS,
you can combine the advantages of both: fit a multi-level model, but
get cluster-robust standard errors. In -gllamm-, you can go further,
fit a multi-level model but also account for clusters that are not
part of the model.
-Steve
On Feb 12, 2009, at 10:11 AM, ronggui wrote:
Hi all,
When the data is clustered rather then independent, there are many
possible ways to handle it such as dummy variable, panel data models
(fixed or random effects), GEE and multilevel model (e.g.
http://www.gseis.ucla.edu/courses/ed230bc1/notes3/cluster.html). Of
course, cluster() option of reg is one quick choice too. Yet, I have
noticed that the result (especially coef) from reg differs that from
multilevel model. Furthermore, this kind of adjustment is rare in
other statistical software. All of this makes me cast doubt on the
practices of regression with cluster() option in the Stata way. I
wonder if there is paper on this, especially if this adjustment is
proper or improper? Besides, can we predict the change of level of
significance by using cluster()? Thanks in advance.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/