On 3/11/07, Anna Gueorguieva <[email protected]> wrote:
1. I used the following code to generate my sample schools within each strata:
set seed 123456789
sample 4.28, by(lgacode numschools)
I was aiming to do a probability proportional to size (pps) sample but
I do not think this is it (correct me if I am wrong).
This is not PPS -- there is a couple of implementations out there. Try
-findit pps- to locate some. I know that mine is not quite proper, but
rather approximate -- at the time of writing it, I was not aware of
all the complications of the PPS sampling, which are many.
How does the by statement affect my sampling weights?
I think I just did simple random sampling and my code should be:
gen weight=1/sampling_probability=1/(.0428*numschools)
The -by- statement does not affect the weights, but it affects your
sample size, in the end. Your -by- variables are becoming strata for
your -svyset- statement, and your weight should be just 1/0.0428, so
you would really need that if you shall be estimating the totals, like
the number of students enrolled over the whole population. It will not
matter that much when you will be estimating fractions, ratios,
regression lines.
2. After the schools are sampled, teachers are sampled systematically:
One teacher within each class level as the teacher might be selected
as the first, last or middle name in an alphabetized list.
Systematic samping is tricky -- technically, you cannot estimate
variances due to that stage, unless you take something randomly at
least twice.
So my svyset statement for the school-level dataset should be:
svyset school_code [pw=weight], strata(strata) fpc(numschools_bystrata)