|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: RE: when your sample is the entire population
I would just say -- "what Nick says" ;)
But I'd like to emphasize one aspect related to his points 3 (and/or 4) --
measurement error. In many real applications, the outcome (and, unfortunately,
the predictors) are measured with error. Therefore, you have uncertainty even
with data for the full population. Also, the superpopulation concept ( point 1)
seems quite reasonable -- at least for most program evaluation questions where
you may collect data for all program participants (or kids in a school) but they
can be considered a sample of some larger potential population. Of course in
program evaluation you also still have uncertainty introduced by any
comparison/control group employed in the analysis.
Michael Blasnik
----- Original Message -----
From: "Nick Cox" <[email protected]>
To: <[email protected]>
Sent: Friday, January 18, 2008 3:02 PM
Subject: st: RE: when your sample is the entire population
I guess most people will have a short answer and a long answer
to this one. You are going to get my short answer.
Also, in statistical science, it seems that most people who think they
have a reasonably smart, or at least sensible, answer think some of the
other guys' reasonably smart answers are really fairly stupid, or at
least difficult to understand. So it may be colourful if and when people
start telling me that after a few decades of sweat and toil I _still_
don't understand statistics at all.
If the question is what meaning is attached to a P-value, then there
seem many possible partial answers.
1. I am looking only at a sample of size n and I think of this as only
one of many possible samples of the same size from a larger population.
That is most plausible if someone really did select that sample using
random numbers, or something equivalent, and it's a greater or lesser
stretch otherwise. In many cases the sample you have just fell into your
lap somehow
and the whole exercise is to treat the data _as if_ it were a random
sample, partly because that's a calculation you can do. There's usually
some wishful thinking involved. Both texts and teachers vary enormously
on how candidly they discuss what is going on. This seems to be what is
most emphasised in most introductory courses and texts, but it may be
the least applicable story in statistical practice!
2. I am looking at a sample of size n and I am willing to think of this
as one possible outcome among many. I can get a reference population by
resampling the data I have repeatedly. Permutation and bootstrap methods
fit under this heading. I think it wry that in less than 30 years
bootstrap methods have gone from being widely regarded as a form of
cheating to being widely considered as the best way to get a P-value in
many problems.
3. I have a model, at its simplest response a function of predictors
plus some error term, and the uncertainty comes from the fact that the
model is always a approximation and stochastic by virtue of its error
term. Whether your n is the whole N is immaterial, because the
uncertainty is not about sampling at all.
4. What I have I regard as the realisation of a stochastic process
(usually in time, or space, or both). The realisation is unique, but at
least in principle there could have been other realisations.
I won't quarrel with anyone who thinks #3 and #4 sound the same.
5. Bayesians have other stories.
6. I must have forgotten or be unaware of yet other stories. Bill Gould
has tried to explain quantum mechanics to me several times. I am pretty
clear that he understands it very well.
In these terms you seem to be saying #1 does not apply in your case, but
that still leaves other arguments, and there is a lot of scope for
arguing what is central to #1 in any case.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/