Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Regression with about 5000 (dummy) variables
From
[email protected]
To
[email protected]
Subject
Re: st: Regression with about 5000 (dummy) variables
Date
Thu, 19 Apr 2012 15:26:55 +0000
I'm pretty sure Paul D. Allison in his excellent 2009 monograph on "Fixed Effects Regression Models" (Sage QASS Paper 160, Thousand Oaks, CA: Sage) said you also add the mean-deviation variables along with the cluster means in the -test- statement as well, so you would have eight variables in this example (assuming that x1-x4 are already demeaned in the -xtreg-, which surely would be?).
Indeed, he also says that this procedure has better statistical properties than the Hausman test.
I'm having to transcribe some boring but intelligence-sensitive phone conversations at work right now, so I'm not near a copy to confirm this.
C
-----Original Message-----
From: John Antonakis <[email protected]>
Sender: [email protected]
Date: Thu, 19 Apr 2012 16:57:27
To: <[email protected]>
Reply-To: [email protected]: Re: st: Regression with about 5000 (dummy) variables
Hi:
Let me let you in on a trick that is relatively unknown.
One way around the problem of a huge amount of dummy variables is to use
the Mundlak procedure:
Mundlak, Y. (1978). Pooling of Time-Series and Cross-Section Data.
Econometrica, 46(1), 69-85.
....for an intuitive explanation, see:
Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On
making causal claims: A review and recommendations. The Leadership
Quarterly, 21(6). 1086-1120.
http://www.hec.unil.ch/jantonakis/Causal_Claims.pdf
Basically, for each time varying independent variable (x1-x4), take the
cluster mean and include that in the regression. That is, do:
foreach var of varlist x1-x4 {
bys panelvar: egen cl_`var'=mean(`var')
}
Then, run your regression like this:
xtreg y x1-x4 cl_x1-cl_x4, cluster(panelvar)
The Hausman test for fixed- versus random-effects is:
testparm cl_x1-cl_x4
This will save you on degrees of freedom and computational requirements.
This estimator is consistent. Try it out with a subsample of your
dataset to see. Many econometricians have been amazed by this.
HTH,
J.
__________________________________________
Prof. John Antonakis
Faculty of Business and Economics
Department of Organizational Behavior
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
http://www.hec.unil.ch/people/jantonakis
Associate Editor
The Leadership Quarterly
__________________________________________
On 19.04.2012 16:39, Suryadipta Roy wrote:
> Dear Statalisters,
>
> I am trying to run a fixed effects panel regression which has more
> than 4000 dummies (based on theory in the gravity model literature in
> inernational economics), and hence close to 5000 variables in the
> regression. The coefficients of the dummy variables are not of any
> interest. The code is as follows: xtreg y x1 x2...... imp_time_*
> exp_time_*, fe cluster(panelvar), where panelvar has been set using -
> xtset- , and imp_time and exp_time are importer-time and exporter-time
> fixed effects respectively. However, the regression had run close to 2
> hours without generating any result at which I stopped it using
> -Break- . I had set the memory to 5000m, and the matsize to 5000 using
> -set- .
>
> My Stata specification is Stata/SE 11.2 for Windows (64-bit x86-64).
> My PC specification: Processor- intel core i5-2430M CPU @ 2.40GhZ;
> RAM- 8 GB, in a 64-bit OS.
>
> I would have greatly appreciated some help to find out if this is
> normal for Stata to take this much time (or more) in the presence of a
> large number of variables, and if there is a way to accomplish the
> task faster. The gravity literature has suggested a couple of ways to
> do this without the dummy variable approach, but I was trying to find
> out if there is a better way to do it if I persist with the dummy
> variables. Any help is greatly appreciated.
>
> Best regards,
> Suryadipta.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/