Daniel Feenberg <[email protected]> wrote
> Most requests here at NBER for Stata-SE are from users with fixed
> effect models who expect to add a dummy variable for each respondent
> in a panel. They are usually easily convinced that this is not
> necessary. However sometimes users want to interact a time trend
> with the fixed effect. Is there a way to estimate such a model
> without adding a variable for each respondent?
Short-answer
------------
One way to estimate this type of model is to double difference the data and
estimate the parameters via ordinary least squares with cluster-robust
standard errors.
Long-answer
------------
Consider the model
y_it = u_i + a_i*t + B x_it + e_it
where y_it is the dependent variable
u_i is the unobserved individual specific intercept that may be
correlated with a_i and x_it
a_i is the unobserved individual specific trend, which may be
correlated with u_i and x_it
x_it is a vector of time-varying covariates, which may be correlated
with a_i and u_i
B is a vector of coefficients on x_it
e_it is idiosyncratic error that is independently distributed over the
the panels
(Notes: the e_it may have some serial correlation and the
independence over the panels is unnecessarily strong.)
Let's begin with the case in which there are no gaps withins the panels.
(We drop this assumption below.) The number of observations per panel may
vary. First differencing the data removes the individual specific intercept
D.y_it = a_i + B D.x_it + D.e_it
This is a standard fixed effects model, the parameters of which could be
estimated by -xtreg, fe-. As with the simple fixed-effects model, we could
estimate the parameters by differencing the data applying ordinary least
squares. Differencing the data again yields
D2.y_it = B D2.x_it + D2.e_it
Recall that at the beginning of this example, I assumed that there were no
gaps in the data. The assumption of no gaps is crucial if one wants to apply
the standard FE estimator on the first-differenced data. The assumption is
not necessary for the double difference model because the gaps will simply
cause a loss of observations.
Here is an example that simulates some data and runs the regressions.
First, let's simulate some data.
------------------- begin data generation section -------------------------
. clear
. set seed 12345
. set mem 50m
(51200k)
.
. set obs 500
obs was 0, now 500
.
. gen id = _n
.
. gen ui = invchi2(2,uniform())
.
. gen ai = invnorm(uniform()) +.3*ui
.
. expand 10
(4500 observations created)
.
. sort id
. by id: gen t = _n
.
. tsset id t
panel variable: id, 1 to 500
time variable: t, 1 to 10
.
. gen x1 = invchi2(2,uniform()) + .5*t + .3*ui
. gen x2 = invchi2(2,uniform()) + .7*t + .4*ui
.
. gen eit =invchi2(2,uniform())
.
. gen y = ui + ai*t + 1*x1 + 2*x2 + eit
------------------- end data generation section -------------------------
The data generating process is standard. Note that a_i, u_i and x_it are
all correlated with each other. Removing these correlations would allow you
to use other estimators. Just to highlight that normality is not required,
I avoided using normal errors. (I made the a_i normal to illustrate that
the individual specific time trends need not all have the same sign.)
The correlation between a_i and u_i is such that the FE estimator will be
inconsistent.
. xtreg y x1 x2, fe
Fixed-effects (within) regression Number of obs = 5000
Group variable (i): id Number of groups = 500
R-sq: within = 0.8094 Obs per group: min = 10
between = 0.5605 avg = 10.0
overall = 0.6106 max = 10
F(2,4498) = 9552.53
corr(u_i, Xb) = 0.1865 Prob > F = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.258295 .0282743 44.50 0.000 1.202864 1.313727
x2 | 2.424745 .0244308 99.25 0.000 2.376848 2.472641
_cons | 3.570195 .1781392 20.04 0.000 3.220954 3.919435
-------------+----------------------------------------------------------------
sigma_u | 7.2627525
sigma_e | 4.2590515
rho | .74410687 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(499, 4498) = 27.96 Prob > F = 0.0000
Ordinary least squares on the double differenced data, produces consistent
estimates. I clustered on -id- to account for the within panel serial
correlation that is present even if the original error e_it has no serial
correlation.
. reg d2.(y x1 x2), nocons cluster(id)
Regression with robust standard errors Number of obs = 4000
F( 2, 499) = 5358.14
Prob > F = 0.0000
R-squared = 0.8375
Number of clusters (id) = 500 Root MSE = 4.9086
------------------------------------------------------------------------------
| Robust
D2.y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 |
D2 | 1.00929 .0212335 47.53 0.000 .9675721 1.051008
x2 |
D2 | 2.010881 .0213753 94.07 0.000 1.968884 2.052877
------------------------------------------------------------------------------
Now let's illustrate that gaps in the panels cause the expected loss of
observations.
. replace y = . if t == 5
(500 real changes made, 500 to missing)
.
. reg d2.(y x1 x2), nocons cluster(id)
Regression with robust standard errors Number of obs = 2500
F( 2, 499) = 3906.51
Prob > F = 0.0000
R-squared = 0.8376
Number of clusters (id) = 500 Root MSE = 4.9251
------------------------------------------------------------------------------
| Robust
D2.y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 |
D2 | 1.038029 .0258706 40.12 0.000 .9872002 1.088858
x2 |
D2 | 2.006183 .0253875 79.02 0.000 1.956303 2.056062
------------------------------------------------------------------------------
David
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/