Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Random Sample Selection in Panel Data
From
Stas Kolenikov <[email protected]>
To
[email protected]
Subject
Re: st: Random Sample Selection in Panel Data
Date
Fri, 13 May 2011 09:31:00 -0500
On Fri, May 13, 2011 at 8:29 AM, Dennis Kramer <[email protected]> wrote:
> I have a large panel data sets (4 years-- 250,000 + records per year)
> and I want to generate four random sample groups to test the stability
> of the estimates. However, I want to ensure that if a ID is selected
> in Year 1 then are are subsequently selected into the sample random
> sample for Years 2, 3, and 4.
>
> I know for a cross-sectional random sampling the code is as follows:
>
> generate rannum = uniform()
> egen grp2 = cut(rannum), group(4)
bysort id (year) : replace grp2 = grp2[1]
I wouldn't even bother with -egen-, which takes a while with your 1M
observations, and would just
generate byte grp2 = ceil( 4*uniform() )
The groups will be slightly disbalanced, but with 250K observations,
that's barely an issue. You might have problems, if you have a complex
survey structure (PSU/stratum). In that case, it is not quite clear to
me whether you'd want to sample individuals or PSUs or individuals
within PSUs, or what, to test your stability assumption; and besides,
you would need to modify the sampling weights to account for an extra
stage of sampling you introduced.
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/