Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: Random Sample Selection in Panel Data |
Date | Fri, 13 May 2011 14:57:54 +0100 |
One way to tackle this is that you perform sample selection on a dataset with one just one identifier per observation. Then you -merge- with the main dataset. Equivalently, tag just one observation per identifier, sample within that subset, and then expand to include all observations for each identifier. -egen, max()- is one way to do the expansion. In fact . search sample, faq shows that this is an FAQ, and that you could have identified relevant material directly within Stata, e.g. FAQ . . . . . . . . . . . . . . . . . . Sampling clusters, not individuals . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and S. Merryman 5/06 How can I sample clusters, not individuals? http://www.stata.com/support/faqs/data/sampleby.html Nick n.j.cox@durham.ac.uk Dennis Kramer I have a large panel data sets (4 years-- 250,000 + records per year) and I want to generate four random sample groups to test the stability of the estimates. However, I want to ensure that if a ID is selected in Year 1 then are are subsequently selected into the sample random sample for Years 2, 3, and 4. I know for a cross-sectional random sampling the code is as follows: generate rannum = uniform() egen grp2 = cut(rannum), group(4) Does anyone have any insight into modifying the above syntax to automatically include years2, 3, 4, ids in the same sample as the selected Year 1 ID?? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/