Dear Statalisters:
I have a question about Stata's -suest- command that I hope someone may be
able to answer for me. I have seen it asked by others a few times before
over the past year without any response.
It is my understanding that the Hausman test, which is often used to
evaluate the consistency of the estimates from random effects models, cannot
be used with survey (ie, clustered, probability-weighted) data. I was
wondering if the -suest- command could be used to implement a valid version
of the Hausman test (for comparing random and fixed effects specifications)
for use with survey data. I have done so using the code given at the end of
this message.
Some background first. I have data from a multistage probability sample of
the US population (n=3773) with oversamples of blacks and Hispanics. I am
interested in estimating a design-consistent model allowing for a
respondent-level random effect. I wish to compare the random effects
specification against the corresponding fixed effects model using the
Hausman test. To estimate the random effects model, I do the following:
(1) generate weighted estimates of the variance components
(2) apply a GLS transform to the data
(3) estimate the model from the transformed data using -regress-
According to Korn and Graubard, the above procedure may not always work. It
does in my case because I have a large number of sufficiently large PSUs.
The parameter estimates and standard errors I get are equivalent to those
derived when using SUDAAN (which estimates the corresponding covariance
pattern model).
To perform the Hausman test, I do the following:
(1) I concatenate the GLS-transformed and original data using -append-
(2) Using -regress- with the score option, I estimate the random effects
model from the GLS-transformed data and save the estimates
(3) Using -regress- with the score option, I estimate the fixed effects
model from the original data (including dummies for respondents) and save
the estimates
(4) I perform the simultaneous estimation using -suest- with the svy option
(5) I perform Hausman's test for the consistency of the random effects model
by testing the difference between the two coefficient vectors (excluding the
constant and fixed effects)
The above procedure seems to work. -suest- gives me the correct parameter
estimates and standard errors for the two models. However, I notice that I
am only able to test for differences in 8 coefficients simultaneously.
There were 12 independent variables in each model (excluding the constant
and respondent dummies in the fixed effects specification). Interestingly,
it does not seem to matter which 8 coefficients I test. I always get the
same statistical result (ie, F and p values). My thought is that this must
somehow be related to the fact that my data are clustered (ie, that I am
allowing for clustering at the level of the PSU). In other words, I think
it may be a peculiarity of my data and that the code I present below is
working correctly. Does this sound plausible?
Any feedback you could provide me with would be greatly appreciated. Thank
you very much.
Regards,
Jim
James W. Shaw, PhD, PharmD, MPH
Post-Doctoral Fellow
Tobacco Control Research Branch
Behavioral Research Program
Division of Cancer Control and Population Sciences
National Cancer Institute
/* STATA CODE */
/* GLS TRANSFORM DATA */
collapse (mean) depvar m1-a2 d1 c3 c32 [pw = ttowgt], by(rti_id)
ren depvar depvar2
ren m1 m12
ren m2 m22
ren s1 s12
ren s2 s22
ren u1 u12
ren u2 u22
ren p1 p12
ren p2 p22
ren a1 a12
ren a2 a22
ren c3 c3n
ren c32 c32n
sort rti_id
save "E:\Dissertation\Data\temp1", replace
use "E:\Dissertation\Data\tempus.dta", clear
drop _merge
sort rti_id
merge rti_id using "E:\Dissertation\Data\temp1"
xtreg depvar m1-a2 c3 c32 [iw = ttowgt], i(rti_id) mle
gen theta = 1 - sqrt(e(sigma_e)^2/(12*e(sigma_u)^2 + e(sigma_e)^2))
gen depvar3 = depvar - theta*depvar2
gen m13 = m1- theta*m12
gen m23 = m2 - theta*m22
gen s13 = s1 - theta*s12
gen s23 = s2 - theta*s22
gen u13 = u1- theta*u12
gen u23 = u2 - theta*u22
gen p13 = p1- theta*p12
gen p23 = p2- theta*p22
gen a13 = a1 - theta*a12
gen a23 = a2- theta*a22
gen c33 = c3- theta*c3n
gen c323 = c32- theta*c32n
gen one = 1
summ one
scalar omean = r(mean)
gen one3 = one - theta*omean
/* SAVE TRANSFORMED DATA FOR RANDOM EFFECTS ESTIMATION */
gen res = 1
sort psu rti_id time
save "E:\Dissertation\Data\temp1", replace
/* RENAME RAW (UNTRANSFORMED) VARIABLES FOR FIXED EFFECTS ESTIMATION */
use "E:\Dissertation\Data\tempus.dta", clear
ren depvar depvar3
ren m1 m13
ren m2 m23
ren s1 s13
ren s2 s23
ren u1 u13
ren u2 u23
ren p1 p13
ren p2 p23
ren a1 a13
ren a2 a23
ren c3 c33
ren c32 c323
gen one3 = 1
gen res = 0
/* APPEND TRANSFORMED DATA TO RAW DATA */
sort psu rti_id time
append using "E:\Dissertation\Data\temp1"
/* ESTIMATE RANDOM EFFECTS MODEL */
svyset [pw = ttowgt], psu(psu)
reg depvar3 one3 m13-a23 c33 c323 if res == 1 [iw = ttowgt], score(RE)
nocons
est store RE
/* ESTIMATE FIXED EFFECTS MODEL */
tab rti_id, gen(id)
reg depvar3 one3 m13-a23 c33 c323 id2-id3773 if res == 0 [iw = ttowgt],
score(FE) nocons
est store FE
/* USE -SUEST- TO PERFORM HAUSMAN TEST */
suest RE FE, svy
test [RE_mean = FE_mean]: m13 m23 s13 s23 u13 u23 p13 p23 a13 a23 c33 c323
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/