To Steven Samuel
Forgive me for interferring your conversation with
Mr. Richard
Williams.
However I'm dealing with a dataset consisting of
10 subsamples with
information collected over a period of 7 years.
I was just wondering why you suggest to the ignore
the study
weights, especially if they were
post-stratified...?
Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes All� 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768
You are not interfering This is a conversation open
to all. This is
a slightly expanded version of what I sent to you
privately.
How to treat the subpopulation and weights depends
on the purpose of
the study. There is a Statalist thread which you
can look up. First,
note that the 'subpopulation' Richard's student
wants to study is not
a 'subsample'. I have sometimes taken 10 random
subsamples of a
single population to study variability between
samples. This is the
method of 'interpenetrating replicated subsamples'
of Mahalanobis
which was popularized by WE Deming in the
1950's(Sample Design in
Business Research, Wiley, 1960).
To expand on the reason for ignoring the
subpopulation criterion. If
Richard's student were to analyze the data as a
subpopulation, then
every sample mean have to be considered a ratio
estimate, effectively
analyzed with a 'ratio' procedure, which is what the
'subpop' option
in the survey commands does. This is because the
denominator in mean
= (sum of X variable)/(no. of people in the
subpopulation) would be
considered a random variable. At an extreme, the
very appearance of a
subpopulation is a random event and the appropriate
SE takes this
into account. However it is likely that Richard's
student is
interested in the subpopulation as a way of studying
a question
unrelated to the original targt population--see
below. In
theoretical terms, she may want to study
associations, conditional on
membership in the subpopulation.
To answer your question about weights.
1. If the purpose of a study is analytic (hypothesis
testing,
studying relations between variables) then Richard's
student may not
be really interested in the original target
population. As an
example, she might never report the weighted counts;
she would report
the sample counts for crucial variables. The only
weights that I
would suggest, if any, are those which correct for
non-response and
unequal probability of selection.
2. It may be better to consider the study as an
'experimental
design', where population numbers of the
experimental groups are not
relevant. In Survey Errors and Survey Costs by R.
Groves (Wiley
Books), Groves posts the example of a study of noise
in the vicinity
of an airport. A study is to be done dividing the
area around the
airport into 'strata', which are zones at equal
distance from the
flight path or airport. An equal sample size is
taken from each zone
and the goal is to study relation of noise to
distance. Of course
most people in the study area will not live in the
closest zones. A
weighted analysis would give the closest people
their population
weight. This would be okay if the main goal was
descriptive--to
estimate the 'average' noise experienced by
residents around the
airport. However if you consider this an
experimental design, then
you want equal numbers at each dose, or, in fact,
more at the
extremes. Thus you would not apply the population
weights.
You may think this is an extreme case, but I have
seen just this
analysis in a published study of the association of
gestational age
to birth weight. Low birth weight infants were
oversampled--they are
only 5-10% of the population. Yet the analysts did
the weighted
analysis, which meant that the association in the
vicinity of low
birthweights was badly determined unless the model
was correct.
This is an ongoing debate among survey
statisticians, so you will get
different points of view.
On Nov 21, 2007, at 3:08 PM, John Singhammer wrote:
To Steven Samuel
Forgive me for interferring your conversation with
Mr. Richard
Williams.
However I'm dealing with a dataset consisting of
10 subsamples with
information collected over a period of 7 years.
I was just wondering why you suggest to the ignore
the study
weights, especially if they were
post-stratified...?
Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes All� 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441
*
* For searches and help try:
*
http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/