Augusto Hoszowski <[email protected]> wrote:
> I need to select one sample stratified. My file contains the id_strata, the
> size of the sample in the strata and size_sample. What is wrong with the
> syntax ?
>
> program define seleccio
> local tamanio size_sample[`1']
> by id_strata: sample `tamanio', count
> end
> seleccio 1
>
> Sincerely yours,
The -sample- command requires an explicit number to be specified, thus the
error message:
. by id : sample sample_size[1], count
'size' found where number expected
r(7);
If the variable sample_size contains the same value for all observasions, then
use
. local n = sample_size[1]
. by id : sample `n', count
If sample_size is constant within id_strata, but different for differenve
values of id_strata, then some programming is required. In the following
example: I generate some data, define my sample selector program, run it on
the data, then summarize the results using -tabulate- and -summarize-. The
-sel1- program requires 3 arguments:
arg 1: id -- a strata id variable
arg 2: size -- a sample size variable -- containing sample sizes for each
stratum. Note that I do not check to make sure that this variable is constant
within the id variable, I'll leave that as a exercise. :)
arg 3: tokeep -- a valid name to use to generate a variable that indicates
which observations were selected for the sample. Note that I just -drop- this
variable if it exists, then generate my own at the end.
***** BEGIN mysam.do
cap log close
* generate some data
clear
local obs 100
set obs `obs'
set seed 92507
* the strata id variable
gen id = int(5*uniform()) + 1
sort id
* the size for each stratum
gen size = .
by id : replace size = cond(_n==1,int(_N*(1 + uniform())/2), size[_n-1] )
* some measurement
by id : gen y = id*( 1 + invnorm(uniform()) )
* my program that indicates the sampled observations
cap program drop sel1
program define sel1
args id size tokeep
/* id : group id
* size : sample size (NOTE: assumed contant within -id-)
* tokeep : name of var to indicate sampled obs
*/
confirm var `id'
confirm numeric var `size'
confirm name `tokeep'
/* replace with my own sample indicator var */
cap drop `tokeep'
/* randomly order the obs */
tempvar r
gen `r' = uniform()
sort `id' `r'
/* generate sample indicator */
by `id' : gen `tokeep' = _n<=`size'
end
qui log using mysam.log, replace
sel1 id size kept
tab id, sum(size) mean obs
tab id if kept, sum(size) mean obs
sum y
sum y if kept
qui log close
***** END mysam.do
Here is the log produced by the above -do- file.
***** BEGIN mysam.log
. sel1 id size kept
. tab id, sum(size) mean obs
| Summary of size
id | Mean Obs.
------------+------------------------
1 | 14 20
2 | 16 19
3 | 14 16
4 | 12 22
5 | 21 23
------------+------------------------
Total | 15.55 100
. tab id if kept, sum(size) mean obs
| Summary of size
id | Mean Obs.
------------+------------------------
1 | 14 14
2 | 16 16
3 | 14 14
4 | 12 12
5 | 21 21
------------+------------------------
Total | 16.012987 77
. sum y
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
y | 100 3.421033 3.903548 -2.579357 16.88143
. sum y if kept
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
y | 77 3.598924 4.157545 -2.140999 16.88143
. qui log close
***** END mysam.log
--Jeff
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/