A while ago, a post on this forum discussed the equivalence of
-xtreg,fe-, manual demeaning and -xtdata, fe- to estimate fixed
effects models, showing that all three methods would lead to the same
point estimates (even though, the post did not say, the standard
errors will naturally be different, due to incorrect DoF adjustment):
http://www.stata.com/statalist/archive/2004-07/msg00230.html
Now my question relates to unbalanced panels. The example by Scott
Merryman above used a perfectly balanced panel. His results will not
go through, though, when the panel is unbalanced, as in this example
(modeled on the previous one):
*** begin code ***
*** drop some of the observations
webuse grunfeld, clear
gen randomnumber=uniform()
drop if randomnumber>0.8
drop randomnumber
*** run the same regressions
xi i.time, pre(D)
qui xtreg inv mval kstock D*, fe r
est store xtreg
qui areg inv mval kstock D*, r a(company)
est store areg
*Demean in time dimension
foreach var of varlist invest mvalue kstock {
egen mean_`var'_time = mean(`var'), by(time)
gen demean_`var' = `var' - mean_`var'_time
}
qui xtreg demean_invest demean_mvalue demean_kstock, fe r
est store demeaned
*Demean in the time and cross-section dimensions
foreach var of varlist invest mvalue kstock {
egen mean_`var'_time_com = mean(demean_`var'), by(com)
gen demean2_`var' = demean_`var' - mean_`var'_time_com
}
qui reg demean2_invest demean2_mvalue demean2_kstock, r
est store demeaned2
*Use -xtdata,fe to demean data
tsset company year
xtdata company year invest mvalue kstock time, fe clear
tsset year company
xtdata, fe clear
reg invest mval kst, r
est store xtdata
est table xtreg areg demeaned demeaned2 xtdata, ///
keep(mvalue kstock demean_mvalue demean_kstock ///
demean2_mvalue demean2_kstock ) se
*** end code ***
While the results based on manual demeaning will be slighly off the
mark (by about 5%, as a ballpark figure), the ones based on -xtdata-
will be really far off, sometimes up to the point of having no
estimate at all due to insufficient variation in the observations (of
course, this will depend on your draw of "randomnumber" above).
So my questions are:
- why is the -xtdata,fe- method so far off, i.e. what is the algorithm
that this command follows which is failing when the panel is
unbalanced?
- more importantly: what do you suggest to do if one has an unbalanced
panel which is too large to be handled with -xtreg,fe-? (matsize
problems etc.)
Thanks for your thoughts and suggestions,
Davide
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/