[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Taking random samples from data

From	"Peter Adamson" <[email protected]>
To	<[email protected]>
Subject	st: RE: Taking random samples from data
Date	Thu, 31 Jul 2008 16:50:58 +0100

Reo,

You could try -reshape- on your data first.  Then bsample.

Peter

Peter Adamson

Epidemiology & Genetics Unit

Department of Health Sciences

Area 3 Seebohm Rowntree Building

University of York

York

YO10 5DD

TEL: +44(0) 1904 321879

FAX: +44(0) 1904 321899


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Song
Sent: 31 July 2008 16:38
To: [email protected]
Subject: st: Taking random samples from data

Hi All,

I have a question about taking random samples from my data. My dataset
has 
around 12,500 user ID's with 200,000 observations total and I want to
take 
around 500-600 (number of users) random samples. The problem is that
each 
member has multiple observations and I want to take all sub-observations
for 
each member. Each ID has 4 to 21 observations. For example, if ID number
5 
has 10 observations, I want to take all 10 observations given ID number
5 is 
included in the sample.

I tried the following and ended up with 580 number of users with around 
8,800 observations. This method works, but I wonder if there is there
any 
better way for this job, because I have to drop duplicated samples with
this 
method.

gen idcnt=_N
bsample 600, cluster(id)     /* sampling with replacement: I do not know
how 
to take cluster samples without replacement. */
bysort id: egen idcount=count(id)
compare idcount idcnt
duplicates tag, gen(dup)
drop if dup==1                /* To drop duplicated samples */

I would greatly appreciate your help.
Reo. 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: RE: Taking random samples from data
  - From: "Nick Cox" <[email protected]>

References:
- Re: st: Mata versus Matlab
  - From: Michael Manti <[email protected]>
- RE: st: Mata versus Matlab
  - From: "Nick Cox" <[email protected]>
- RE: st: Mata versus Matlab
  - From: "Rajesh Tharyan" <[email protected]>
- st: Taking random samples from data
  - From: "Song" <[email protected]>

Prev by Date: st: RE: RE: Taking random samples from data
Next by Date: Re: st: use of if command after commands with options
Previous by thread: st: Taking random samples from data
Next by thread: st: RE: RE: Taking random samples from data
Index(es):
- Date
- Thread