Dear Stata-list,
I am using a data set of 963,966 observations, with 26 variables (after
dropping all variables not needed for my estimation). The observations are
dyadic observations, I have in fact (1400 squared)/2 pairs of observations
(divided by 2 because the relationship is non directional) and so in the
regressions, I need to control for 1400*2 dummy variables. I run a
regression of the form:
xi: reg y x1 x2 x3 i.observation1 i.observation2
where my dataset consists of dyadic relationships between each
observation1 and each observation2.
The problem I run into is that each regression takes an incredibly long
time (and the server crashes regularly).
In an alternative regression, I use Fafchamps and Gubert NGREG: I run:
xi: ngreg y x1 x2 x3, id(observation1 observation2)
This also takes an incredibly long time.
My question is: Is there a more efficient way to run regressions in stata
with such an enormous amount of dummy variables?
PS: I do not care about the coefficient on the dummies per se.
Thank you very much in advance for your response.
Pauline
--
Pauline Grosjean
Ciriacy Wantrup Fellow, Department of Agricultural and Resource Economics
University of California Berkeley
Web page: http://are.berkeley.edu/~pgrosjean/
Mobile: 510 384 0141
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/