Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Three Fixed Effects with Millions of Observations
From
Fernando Rios Avila <[email protected]>
To
[email protected]
Subject
Re: st: Three Fixed Effects with Millions of Observations
Date
Wed, 19 Mar 2014 17:28:05 -0400
Hi George,
There are a couple of options you can use to estimate your model.
However, because you are dealing with such a large data set, you will
need a lot of patience.
you can always try to estimate the model using the standard -areg-,
-xtreg, fe- among other possibilities.
For a quick review of some of the available commands you can check the
following paper:
http://www.stata-journal.com/article.html?article=st0267
Now, for a non linear model such as a Poisson model, I would suggest
to check Paulo Guimaraes and Pedro Portugal paper :
www.stata-journal.com/article.html?article=st0212
His method can potentially be applied for more than 2 fixed effects,
while including the third one as a set of dummy variables.
For a more direct approach, the paper entitled "OLS with multiple
high dimensional category variables" by Simen Gaure proposes a method
and provides its implementation in R:
paper http://www.sciencedirect.com/science/article/pii/S0167947313001266
R Code: http://cran.r-project.org/web/packages/lfe/index.html
Finally, Although without the cluster correction, I suggest a
implementation code for Stata for an algorithm similar to Guimaraes
and Portugal strategy and closer to Gaure, which directly implements
the case for 3 or more fixed effects.
http://www.levyinstitute.org/publications/?docid=1971
Best
On Wed, Mar 19, 2014 at 5:06 PM, George Shoukry <[email protected]> wrote:
> I have a data set with over 10 million observations and each
> observation is uniquely identified by three variables (say time, firm,
> county). I would like to include fixed effects for the three
> identifying variables, cluster the standard errors at the firm level,
> and run OLS and Poisson regressions for some variables in the data. I
> have two questions:
>
> 1. Ideally I want to do "reg y x i.firm i.time i.county, vce(cluster
> firm)", but this takes too long (not sure exactly how long because I
> stopped it after a while). So far I've been able to get OLS estimates
> on my computer using the undocumented _regress command with the
> absorb() option. The county identifier has the most number of values,
> so I do something like "_regress y x i.firm i.time, absorb(county)".
> The problem is that I cannot seem to cluster the errors at the firm
> level with the _regress command and I can't find documentation for it.
> Any ideas on the fastest way in Stata to obtain OLS estimates in this
> case with clustered errors?
> Note: I tried some other options but they seem to take too long (how
> long do you leave commands running before you stop them?).
>
> 2. Any experience with the best way to run a fixed-effects Poisson
> regression with a large dataset and several fixed effects?
>
> Thanks!
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/