Reo,
You could try -reshape- on your data first. Then bsample.
Peter
Peter Adamson
Epidemiology & Genetics Unit
Department of Health Sciences
Area 3 Seebohm Rowntree Building
University of York
York
YO10 5DD
TEL: +44(0) 1904 321879
FAX: +44(0) 1904 321899
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Song
Sent: 31 July 2008 16:38
To: [email protected]
Subject: st: Taking random samples from data
Hi All,
I have a question about taking random samples from my data. My dataset
has
around 12,500 user ID's with 200,000 observations total and I want to
take
around 500-600 (number of users) random samples. The problem is that
each
member has multiple observations and I want to take all sub-observations
for
each member. Each ID has 4 to 21 observations. For example, if ID number
5
has 10 observations, I want to take all 10 observations given ID number
5 is
included in the sample.
I tried the following and ended up with 580 number of users with around
8,800 observations. This method works, but I wonder if there is there
any
better way for this job, because I have to drop duplicated samples with
this
method.
gen idcnt=_N
bsample 600, cluster(id) /* sampling with replacement: I do not know
how
to take cluster samples without replacement. */
bysort id: egen idcount=count(id)
compare idcount idcnt
duplicates tag, gen(dup)
drop if dup==1 /* To drop duplicated samples */
I would greatly appreciate your help.
Reo.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/