|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: sample adjustment by substitution instead of weighting
Thank you very much Steve for your elaborate answer - it is very
helpful, indeed!
Dirk
On behalf of Steve I include his answer in reply to
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0904/date/article-1134.html
here because at the moment he can't send mails to the list:
---------------------------------------------------------------------
Dirk-
I've never heard of this procedure. There is some basis for thinning a
sample randomly to meet sampling goals, and substitutions for missing
observations are also practiced, but you are not describing either of these.
The process of exclusion and duplication will destroy the ability of the
sample to estimate anything but the characteristics that are being
matched--but those are already known! For instance, the sample cannot
estimate without bias the means of other variates. For the matched
characteristics, the sample will not permit estimation of SD's or
quantiles. Moreover, no standard errors or confidence intervals can be
computed for anything, because the exclusions and duplication have
artificially reduced the variability in the sample.
To better match the sample estimates to known population
characteristics, I know of only three procedures: 1) post-stratification
; 2) sample raking, which is an extension; and 3) generalized regression
estimation (GREG).
The exclusions and duplication are naive attempts to re-weight the
sample. However they completely destroy it. So, no this is not actual
practice. The only discussion of something similar I've read is in Lohr
(1999, Sampling: Design and Analysis, Duxbury, p 463) gives the
reference to Neyman J. 1934. On the two different methods of the
representative method: The method of stratified sampling and the method
of purposive selection. J. Royal Statistical Society 197: 558-606. Here
is the quote from her book:
"Neyman's paper pretty much finished off the idea that results from
purposive samples could be generalized to the population. He presented
an example of the purposive sample taken by Gini and Galvani in the late
1920's. Gini and Galvani chose 29 districts that gave the averages of
all 214 districts in the 1921 Italian census, on a dozen variables. But
Neyman showed that all statistics other than the average values of the
controls showed a violent contrast between the sample and the whole
population."
Of course, Gini and Galvani only excluded, but did not duplicate, they
only excluded. So the procedure has long been discredited.
-Steve
---------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/