|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Svy subsamples
To Steven Samuel
Forgive me for interferring your conversation with Mr. Richard
Williams.
However I'm dealing with a dataset consisting of 10 subsamples with
information collected over a period of 7 years.
I was just wondering why you suggest to the ignore the study
weights, especially if they were post-stratified...?
Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes All� 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768
You are not interfering This is a conversation open to all. This is
a slightly expanded version of what I sent to you privately.
How to treat the subpopulation and weights depends on the purpose of
the study. There is a Statalist thread which you can look up. First,
note that the 'subpopulation' Richard's student wants to study is not
a 'subsample'. I have sometimes taken 10 random subsamples of a
single population to study variability between samples. This is the
method of 'interpenetrating replicated subsamples' of Mahalanobis
which was popularized by WE Deming in the 1950's(Sample Design in
Business Research, Wiley, 1960).
To expand on the reason for ignoring the subpopulation criterion. If
Richard's student were to analyze the data as a subpopulation, then
every sample mean have to be considered a ratio estimate, effectively
analyzed with a 'ratio' procedure, which is what the 'subpop' option
in the survey commands does. This is because the denominator in mean
= (sum of X variable)/(no. of people in the subpopulation) would be
considered a random variable. At an extreme, the very appearance of a
subpopulation is a random event and the appropriate SE takes this
into account. However it is likely that Richard's student is
interested in the subpopulation as a way of studying a question
unrelated to the original targt population--see below. In
theoretical terms, she may want to study associations, conditional on
membership in the subpopulation.
To answer your question about weights.
1. If the purpose of a study is analytic (hypothesis testing,
studying relations between variables) then Richard's student may not
be really interested in the original target population. As an
example, she might never report the weighted counts; she would report
the sample counts for crucial variables. The only weights that I
would suggest, if any, are those which correct for non-response and
unequal probability of selection.
2. It may be better to consider the study as an 'experimental
design', where population numbers of the experimental groups are not
relevant. In Survey Errors and Survey Costs by R. Groves (Wiley
Books), Groves posts the example of a study of noise in the vicinity
of an airport. A study is to be done dividing the area around the
airport into 'strata', which are zones at equal distance from the
flight path or airport. An equal sample size is taken from each zone
and the goal is to study relation of noise to distance. Of course
most people in the study area will not live in the closest zones. A
weighted analysis would give the closest people their population
weight. This would be okay if the main goal was descriptive--to
estimate the 'average' noise experienced by residents around the
airport. However if you consider this an experimental design, then
you want equal numbers at each dose, or, in fact, more at the
extremes. Thus you would not apply the population weights.
You may think this is an extreme case, but I have seen just this
analysis in a published study of the association of gestational age
to birth weight. Low birth weight infants were oversampled--they are
only 5-10% of the population. Yet the analysts did the weighted
analysis, which meant that the association in the vicinity of low
birthweights was badly determined unless the model was correct.
This is an ongoing debate among survey statisticians, so you will get
different points of view.
On Nov 21, 2007, at 3:08 PM, John Singhammer wrote:
To Steven Samuel
Forgive me for interferring your conversation with Mr. Richard
Williams.
However I'm dealing with a dataset consisting of 10 subsamples with
information collected over a period of 7 years.
I was just wondering why you suggest to the ignore the study
weights, especially if they were post-stratified...?
Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes All� 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/