This regards the generation of a random sample.
I have a data set with 4,000 observations, containing 480 unique zip codes.
I want to generate a random sample with 30 zip codes.
I've been told (by a SAS user) that the technique I should use is called "2
stage cluster sampling" or something similar, in which the first stage is
"probability proportionate to size (PPS)." That is, I want to be sure that
those zip codes that have more observations are proportionately more likely
to be chosen for the sample of 30 zip codes.
I have looked on-line, and found the "sample2" syntax, but that only allows
a percentage sampling, and does not appear to do PPS in the first stage(?).
Thanks for any suggestions
Gretchen Caspary
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/