P.Chakkrit <[email protected]> asks how to generate a random sample,
with replacement, of observations in a dataset:
> I would like to ask that how can we random sampling our dataset more than _n
> or _N. I think we have to write a program to repeat more times instead. I
> never write any program before, I also have tried it but it did not work
> out(it only sampling 1 data and finish) so I would be very appreciate if you
> could help me. Below is my program;
> program define sam1 /* e region Nk: to sampling e(1-841) by region(1-19)
> no.Nk in each region (more than no. of e in each region) */
> local t=1
> while `t'<=19{
> local i=1
> while `i'<=`3'{
> keep if region==`t'
> sample 1, count
> local i=`i'+1
> }
> local t =`t'+1
> }
> end
The Stata command to perform sampling of observations with replacement is
-bsample-. With the appropriate options, -bsample- can also perform
stratified, cluster, and stratified-cluster sampling with replacement. (The
stratified sampling features were added in Stata 8).
Based on P. Chakkrit's question, and code, I think we are talking about
stratified sampling. In this case, if we were sampling up to but not more
than the number of observations within -region- (the strata variable), then I
would suggest one of the following:
To sample as many observations as there are in each -region-:
. bsample , strata(region)
To sample 10 observations within each -region-:
. bsample 10, strata(region)
To sample roughly half the observations within each region:
. bysort region: gen half = int(_N/2)
. bsample half, strata(region)
Unfortunately for P. Chakkrit, the algorithm implemented in -bsample- requires
that the sample size(s) be less than or equal to the number of observations
(within strata).
A way around this is to first expand the data, then use -bsample-. How much
you expand the data depends upon your situation. Here are a few examples:
To sample twice as many observations, per stratum, as there are in the data:
. expand 2
. bsample , strata(region)
To sample an extra 10 observations from within each strata, and assuming there
are more than 10 observations within each stratum:
. bysort region: gen Nplus10 = _N+10
. expand 2
. bsample _N+10, strata(region)
To sample 100 observations, per stratum, when there are 30 observations within
the smallest stratum (expand by 4, since there will be at least 4*30=120
observations in each stratum):
. expand 4
. bsample 100, strata(region)
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/