Dear Statalisters,
it is not my usual style to pick up questions which I had posted
previously to this audience and that have not been answered, but I
will break this rule now to post some more findings I made on this
issue. Maybe someone is interested in this question at a later point
in time.
A week ago I noticed that the equivalence (at least in terms of point
estimates, not of std.err.) between -xtreg,fe-, -xtdata,fe- and manual
demeaning does not hold when the panel is unbalanced. I discussed this
using the example of the "grunfeld" panel dataset, which had been used
in a previous Statalist posting (see below).
There are two (three) more insights that I would like to share:
(1) In the example I made below, one of the crucial features was that
we had two-way fixed effects (time and company FE). With simple
one-way FE (company only), there would be no problem with manual
demeaning or -xtdata,fe-.
(2) If we have two-way FE, you need to demean the time dummies as
well. This had not been done in the example by Scott Merryman in July
2004, and it was not a problem as long as the panel was balanced. When
the panel is unbalanced, demeaning the "other" set of dummies is
crucial.
(3) -xtdata,fe- still does not work properly with the unbalanced
dataset, and I have not found out why. Any suggestions are still
appreciated.
For those who are interested, here are two sets of Stata coded to
demonstrate these insights (refer also to my original posting for the
full example).
Best, Davide
*** BEGIN CODE
*** Part (1), a.k.a.: one way FE and unbalanced panel are not a problem
*** load the dataset and drop some of the observations
webuse grunfeld, clear
gen randomnumber=uniform()
drop if randomnumber>0.8
drop randomnumber
*** run the regressions
*** -xtreg,fe-
qui xtreg inv mval kstock, fe r
est store xtreg
*** Demean in company dimension
foreach var of varlist invest mvalue kstock {
egen mean_`var'_comp = mean(`var'), by(company)
gen demean_`var' = `var' - mean_`var'_comp
}
qui xtreg demean_invest demean_mvalue demean_kstock, fe r
est store demeaned
*** Use -xtdata,fe to demean data
tsset company year
xtdata company year invest mvalue kstock time, fe clear
reg invest mval kst, r
est store xtdata
est table xtreg demeaned xtdata, ///
keep(mvalue kstock demean_mvalue demean_kstock) se
*** END CODE
*** BEGIN CODE
*** Part (2), a.k.a.: with two-way FE you need to demean the time
dummies as well
*** load the dataset and drop some of the observations
webuse grunfeld, clear
gen randomnumber=uniform()
drop if randomnumber>0.8
drop randomnumber
*** run the regressions
*** -xtreg,fe-
xi i.time, pre(D)
qui xtreg inv mval kstock D*, fe r
est store xtreg
*** demeaning done right (also the time dummies)
foreach var of varlist invest mvalue kstock D* {
egen mean_`var'_com = mean(`var'), by(com)
gen demean_`var' = `var' - mean_`var'_com
}
qui reg demean_invest demean_mvalue demean_kstock demean_D*, r
est store demeaned
*** use -xtdata,fe to demean data
tsset company year
xtdata company year invest mvalue kstock time, fe clear
tsset year company
xtdata, fe clear
reg invest mval kst, r
est store xtdata
est table xtreg demeaned xtdata, ///
keep(mvalue kstock demean_mvalue demean_kstock) se
*** END CODE
2009/7/2 Davide Cantoni <[email protected]>:
> A while ago, a post on this forum discussed the equivalence of
> -xtreg,fe-, manual demeaning and -xtdata, fe- to estimate fixed
> effects models, showing that all three methods would lead to the same
> point estimates (even though, the post did not say, the standard
> errors will naturally be different, due to incorrect DoF adjustment):
>
> http://www.stata.com/statalist/archive/2004-07/msg00230.html
>
> Now my question relates to unbalanced panels. The example by Scott
> Merryman above used a perfectly balanced panel. His results will not
> go through, though, when the panel is unbalanced, as in this example
> (modeled on the previous one):
>
> *** begin code ***
> *** drop some of the observations
>
> webuse grunfeld, clear
> gen randomnumber=uniform()
> drop if randomnumber>0.8
> drop randomnumber
>
> *** run the same regressions
>
> xi i.time, pre(D)
> qui xtreg inv mval kstock D*, fe r
> est store xtreg
> qui areg inv mval kstock D*, r a(company)
> est store areg
>
> *Demean in time dimension
> foreach var of varlist invest mvalue kstock {
> egen mean_`var'_time = mean(`var'), by(time)
> gen demean_`var' = `var' - mean_`var'_time
> }
> qui xtreg demean_invest demean_mvalue demean_kstock, fe r
> est store demeaned
>
> *Demean in the time and cross-section dimensions
> foreach var of varlist invest mvalue kstock {
> egen mean_`var'_time_com = mean(demean_`var'), by(com)
> gen demean2_`var' = demean_`var' - mean_`var'_time_com
> }
> qui reg demean2_invest demean2_mvalue demean2_kstock, r
> est store demeaned2
>
> *Use -xtdata,fe to demean data
> tsset company year
> xtdata company year invest mvalue kstock time, fe clear
> tsset year company
> xtdata, fe clear
> reg invest mval kst, r
> est store xtdata
>
> est table xtreg areg demeaned demeaned2 xtdata, ///
> keep(mvalue kstock demean_mvalue demean_kstock ///
> demean2_mvalue demean2_kstock ) se
>
> *** end code ***
>
> While the results based on manual demeaning will be slighly off the
> mark (by about 5%, as a ballpark figure), the ones based on -xtdata-
> will be really far off, sometimes up to the point of having no
> estimate at all due to insufficient variation in the observations (of
> course, this will depend on your draw of "randomnumber" above).
>
> So my questions are:
> - why is the -xtdata,fe- method so far off, i.e. what is the algorithm
> that this command follows which is failing when the panel is
> unbalanced?
> - more importantly: what do you suggest to do if one has an unbalanced
> panel which is too large to be handled with -xtreg,fe-? (matsize
> problems etc.)
>
> Thanks for your thoughts and suggestions,
>
> Davide
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/