Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Fernando Rios Avila <f.rios.a@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Working with really large Datasets |
Date | Mon, 15 Oct 2012 18:52:17 -0400 |
Dear stata listers, I wonder if any one here can share some experience on working with really large datasets. I m working with a panel dataset (census type of data) for workers and firms over time. The total number of observations is about 70 million. I want to estimate two way fixed effects models, manually including dummies for regions time and industries. However with the size of the dataset, the results become unmanageable. Does anyone know or can direct me to an strategy to deal with "too much data"? I was thinking about obtaining random samples (say 5%), picking individuals at random, and keeping them along the whole time they appear on the sample, and then combining all the results, in a similar fashion as it is done with Multiple Imputation datasets. But im not sure how valid would that procedure be. Any suggestions are welcome, Thank you. Fernando Rios * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/