Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Regression with about 5000 (dummy) variables
From
Suryadipta Roy <[email protected]>
To
[email protected]
Subject
Re: st: Regression with about 5000 (dummy) variables
Date
Thu, 19 Apr 2012 11:02:29 -0400
Prof. Antolakis,
Thank you so much! I would work on your suggestions and would
definitely let you know if they work.
Best regards,
Suryadipta.
On Thu, Apr 19, 2012 at 10:57 AM, John Antonakis <[email protected]> wrote:
> Hi:
>
> Let me let you in on a trick that is relatively unknown.
>
> One way around the problem of a huge amount of dummy variables is to use the
> Mundlak procedure:
>
> Mundlak, Y. (1978). Pooling of Time-Series and Cross-Section Data.
> Econometrica, 46(1), 69-85.
>
> ....for an intuitive explanation, see:
>
> Antonakis, J., Bendahan, S., Jacquart, P., & Lalive, R. (2010). On making
> causal claims: A review and recommendations. The Leadership Quarterly,
> 21(6). 1086-1120. http://www.hec.unil.ch/jantonakis/Causal_Claims.pdf
>
> Basically, for each time varying independent variable (x1-x4), take the
> cluster mean and include that in the regression. That is, do:
>
> foreach var of varlist x1-x4 {
> bys panelvar: egen cl_`var'=mean(`var')
> }
>
> Then, run your regression like this:
>
> xtreg y x1-x4 cl_x1-cl_x4, cluster(panelvar)
>
> The Hausman test for fixed- versus random-effects is:
>
> testparm cl_x1-cl_x4
>
> This will save you on degrees of freedom and computational requirements.
> This estimator is consistent. Try it out with a subsample of your dataset
> to see. Many econometricians have been amazed by this.
>
> HTH,
> J.
>
> __________________________________________
>
> Prof. John Antonakis
> Faculty of Business and Economics
> Department of Organizational Behavior
> University of Lausanne
> Internef #618
> CH-1015 Lausanne-Dorigny
> Switzerland
> Tel ++41 (0)21 692-3438
> Fax ++41 (0)21 692-3305
> http://www.hec.unil.ch/people/jantonakis
>
> Associate Editor
> The Leadership Quarterly
> __________________________________________
>
>
>
> On 19.04.2012 16:39, Suryadipta Roy wrote:
>> Dear Statalisters,
>>
>> I am trying to run a fixed effects panel regression which has more
>> than 4000 dummies (based on theory in the gravity model literature in
>> inernational economics), and hence close to 5000 variables in the
>> regression. The coefficients of the dummy variables are not of any
>> interest. The code is as follows: xtreg y x1 x2...... imp_time_*
>> exp_time_*, fe cluster(panelvar), where panelvar has been set using -
>> xtset- , and imp_time and exp_time are importer-time and exporter-time
>> fixed effects respectively. However, the regression had run close to 2
>> hours without generating any result at which I stopped it using
>> -Break- . I had set the memory to 5000m, and the matsize to 5000 using
>> -set- .
>>
>> My Stata specification is Stata/SE 11.2 for Windows (64-bit x86-64).
>> My PC specification: Processor- intel core i5-2430M CPU @ 2.40GhZ;
>> RAM- 8 GB, in a 64-bit OS.
>>
>> I would have greatly appreciated some help to find out if this is
>> normal for Stata to take this much time (or more) in the presence of a
>> large number of variables, and if there is a way to accomplish the
>> task faster. The gravity literature has suggested a couple of ways to
>> do this without the dummy variable approach, but I was trying to find
>> out if there is a better way to do it if I persist with the dummy
>> variables. Any help is greatly appreciated.
>>
>> Best regards,
>> Suryadipta.
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/