Without repeating points made by others -- especially Maarten Buis
and Stas Kolenikov -- I note that the further out you are in the
tail of the supposed sampling distribution, the more you depend
on all the underlying assumptions being correct. That said,
the idea that you should throw data away seems perverse.
What's important is not to pay more attention to P-values than
they deserve, which often is not much.
Although it is now customary to stress the numerical aspect
of P-values, and downplay test decisions, the latter
can still be a key issue: i.e. should I bother to pay attention
to this variable or does it work like random noise would?
Nick
[email protected]
Karsten Staehr
I have discussed with a co-author whether datasets used for microeconometric
analyses can be "too large" in the sense of comprising "too many"
observations? With a very large sample size (e.g. over 10,000 observations),
very many estimated coefficients tend to be significant at the 1%-level. My
co-author argues that such datasets with very many observations lead to
"inflated significance levels" and one should be careful about the
interpretation of the estimated standard errors. He suggests reducing the
sample size by randomly drawing a smaller sample from the original sample.
My questions are: 1) Can sample sizes be "too large" leading to too small
standard errors? 2) Do anybody have a reference to papers discussing this
issue? 3) Could it be related to possible misspecification problems of the
model?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/