[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: Stata-SE and Stata on the same server

From	"David M. Drukker, StataCorp" <[email protected]>
To	[email protected]
Subject	RE: st: Stata-SE and Stata on the same server
Date	Wed, 16 Jun 2004 10:02:54 -0500
Daniel Feenberg <[email protected]> wrote

> Most requests here at NBER for Stata-SE are from users with fixed
> effect models who expect to add a dummy variable for each respondent
> in a panel.  They are usually easily convinced that this is not
> necessary. However sometimes users want to interact a time trend
> with the fixed effect. Is there a way to estimate such a model
> without adding a variable for each respondent?


Short-answer
------------

One way to estimate this type of model is to double difference the data and
estimate the parameters via ordinary least squares with cluster-robust
standard errors.

Long-answer
------------

Consider the model 

     y_it = u_i + a_i*t + B x_it + e_it
    
where y_it is the dependent variable
      u_i  is the unobserved individual specific intercept that may be
           correlated with a_i and x_it 
      a_i  is the unobserved individual specific trend, which may be 
           correlated with u_i and x_it
      x_it is a vector of time-varying covariates, which may be correlated 
           with a_i and u_i
      B    is a vector of coefficients on x_it
      e_it is idiosyncratic error that is independently distributed over the 
           the panels

	   (Notes: the e_it may have some serial correlation and the
	   independence over the panels is unnecessarily strong.)

Let's begin with the case in which there are no gaps withins the panels.
(We drop this assumption below.)  The number of observations per panel may
vary.  First differencing the data removes the individual specific intercept

	D.y_it =  a_i + B D.x_it + D.e_it

This is a standard fixed effects model, the parameters of which could be
estimated by -xtreg, fe-.  As with the simple fixed-effects model, we could
estimate the parameters by differencing the data applying ordinary least
squares.  Differencing the data again yields

	D2.y_it =   B D2.x_it + D2.e_it

Recall that at the beginning of this example, I assumed that there were no
gaps in the data.  The assumption of no gaps is crucial if one wants to apply
the standard FE estimator on the first-differenced data.  The assumption is
not necessary for the double difference model because the gaps will simply
cause a loss of observations. 

Here is an example that simulates some data and runs the regressions.


First, let's simulate some data.

------------------- begin data generation section -------------------------
. clear

. set seed 12345

. set mem 50m
(51200k)

. 
. set obs 500
obs was 0, now 500

. 
. gen id = _n

. 
. gen ui = invchi2(2,uniform())

. 
. gen ai = invnorm(uniform()) +.3*ui

. 
. expand 10
(4500 observations created)

. 
. sort id

. by id: gen t = _n

. 
. tsset id t
       panel variable:  id, 1 to 500
        time variable:  t, 1 to 10

. 
. gen x1 = invchi2(2,uniform()) + .5*t + .3*ui

. gen x2 = invchi2(2,uniform()) + .7*t + .4*ui

. 
. gen eit =invchi2(2,uniform())

. 
. gen y = ui + ai*t + 1*x1 + 2*x2 + eit

------------------- end data generation section -------------------------

The data generating process is standard.  Note that a_i, u_i and x_it are
all correlated with each other.  Removing these correlations would allow you
to use other estimators.  Just to highlight that normality is not required,
I avoided using normal errors.  (I made the a_i normal to illustrate that
the individual specific time trends need not all have the same sign.)

The correlation between a_i and u_i is such that the FE estimator will be
inconsistent.

. xtreg y x1 x2, fe

Fixed-effects (within) regression               Number of obs      =      5000
Group variable (i): id                          Number of groups   =       500

R-sq:  within  = 0.8094                         Obs per group: min =        10
       between = 0.5605                                        avg =      10.0
       overall = 0.6106                                        max =        10

                                                F(2,4498)          =   9552.53
corr(u_i, Xb)  = 0.1865                         Prob > F           =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   1.258295   .0282743    44.50   0.000     1.202864    1.313727
          x2 |   2.424745   .0244308    99.25   0.000     2.376848    2.472641
       _cons |   3.570195   .1781392    20.04   0.000     3.220954    3.919435
-------------+----------------------------------------------------------------
     sigma_u |  7.2627525
     sigma_e |  4.2590515
         rho |  .74410687   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0:     F(499, 4498) =    27.96           Prob > F = 0.0000


Ordinary least squares on the double differenced data, produces consistent
estimates.  I clustered on -id- to account for the within panel serial
correlation that is present even if the original error e_it has no serial
correlation.

. reg d2.(y x1 x2), nocons cluster(id)

Regression with robust standard errors                 Number of obs =    4000
                                                       F(  2,   499) = 5358.14
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.8375
Number of clusters (id) = 500                          Root MSE      =  4.9086

------------------------------------------------------------------------------
             |               Robust
D2.y         |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1           |
          D2 |    1.00929   .0212335    47.53   0.000     .9675721    1.051008
x2           |
          D2 |   2.010881   .0213753    94.07   0.000     1.968884    2.052877
------------------------------------------------------------------------------


Now let's illustrate that gaps in the panels cause the expected loss of
observations.

 
. replace y = . if t == 5
(500 real changes made, 500 to missing)

. 
. reg d2.(y x1 x2), nocons cluster(id)

Regression with robust standard errors                 Number of obs =    2500
                                                       F(  2,   499) = 3906.51
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.8376
Number of clusters (id) = 500                          Root MSE      =  4.9251

------------------------------------------------------------------------------
             |               Robust
D2.y         |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1           |
          D2 |   1.038029   .0258706    40.12   0.000     .9872002    1.088858
x2           |
          D2 |   2.006183   .0253875    79.02   0.000     1.956303    2.056062
------------------------------------------------------------------------------

  David
  [email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: RE: Criteria for stratification??
Next by Date: RE: st: parametric vs. nonparametric estimators
Previous by thread: st: RE: Criteria for stratification??
Next by thread: st: Taking Means of Vars Across Time Period
Index(es):
- Date
- Thread