[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Survey statistics, sampling methods
Hi -
I want to thank everyone who provided a response to the question I
posted to the list last week. All were very useful.
I have another set of questions (probably pretty simple) with regards to
analyzing survey data and the use of the svyset command. I am largely
concerned that I am not "naming" the steps we took in our sampling
correctly in terms used in the svyset command.
We think we want to set things up like this (based on the readings I
have found on the svyset command in the archives and manual):
svyset university [pw=pweight], strata(prim_sampling_unit) fpc(<?>) ||
pre_svy_dept, fpc(dept_ratio)
Where university = variable with the codes for each university in our
sample, pw = our probability weight of 0.0.62 ( or 6/98),
prim_sampling_unit = variable with the codes for each of the primary
sampling strata we used, pre_svy_dept = variable with codes for each of
the departments selected as our secondary strata, and dept_ratio = 0.24
(or 5/21)
Given the method (described below), are we setting the correct
parameters for the svyset command?
I have included a lot of detail on our methodology and so apologize at
the length of this message. If you have the patience to read and
provide any insight whatsoever, it will be much appreciated.
My colleague and I conducted a national survey to determine the
attitudes of life scientists toward the ethical and societal
implications of their research. We sent 2000 surveys to life scientists
at 7 different research universities. We received 855 surveys back and
in addition, had about a 10% rate of no contact so our response rate is
about 50%.
We used multi-phase sampling. Our target population is all life science
researchers at US research universities. Our survey population (or
sampling frame) is the top 98 NIH funded research universities in 2004
(available from a publicly available website). We categorized these
universities into one of 8 strata:
Stratum/Category # of
medical school/bioethics presence/public 13
medical school/bioethics presence/private 13
medical school/no bioethics presence/public 45
medical school/no bioethics presence/private 19
no medical school/bioethics presence/public 0
no medical school/bioethics presence/private 0
no medical school/no bioethics presence/public 4
no medical school/no bioethics presence/private 3
We randomly selected one university from each of the 6 categories that
have universities. Our home institution was our 7th institution.
We are thinking that universities is our primary sampling unit (the 6 we
selected in our sampling). We also think that the probability weight
we want to use (the pw) is 6/98. (or do we need to approximate the total
number of research universities in the US?)
We used departments as our secondary sampling unit. We categorized all
the life science-related departments at our institution as either basic
science or clinical and then randomly selected 3 from basic science and
2 from clinical for a total of 5 departments (secondary strata?) from
which we pulled individual researchers. Across the 7 different
institutions there is on average about 21 departments that would fall
into our definition of life science-related departments.
We are not quite certain what our finite-population correction factors
are for the universities strata and for the department strata but think
these are 1/13, 1/13, 1/45, 1/19, 1/4, 1/3 and 5/21, respectively Are
we correct in thinking we need to make use of these ratios?
The unit we actually surveyed is the individual researcher (graduate
students, postdoctoral fellows, research staff, and faculty). Sampling
was done based on position at this point (i.e. we put all grad students
from university 1 in one list and then randomly selected about 66, we
put all postdocs from university 1 in one list and then randomly
selected about 66, etc). Selected about 250 individuals from each of
the 6 universities (a few minor exceptions) and 500 from Stanford. We
also tried to get equal numbers from each of the four position
categories as best as possible. How do we include this into our use of
the svyset command (or do we need to not worry about this)?
Again, we really appreciate any insight anyone might be able to provide.
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/