Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Too many sample points in microeconometric analyses?


From   Maarten buis <[email protected]>
To   [email protected]
Subject   Re: st: Too many sample points in microeconometric analyses?
Date   Wed, 25 Apr 2007 14:33:34 +0100 (BST)

--- Karsten Staehr <[email protected]> wrote:
> I have discussed with a co-author whether datasets used for
> microeconometric analyses can be "too large" in the sense of
> comprising "too many" observations? With a very large sample size
> (e.g. over 10,000 observations), very many estimated coefficients
> tend to be significant at the 1%-level. My co-author argues that
> such datasets with very many observations lead to "inflated
> significance levels" and one should be careful about the
> interpretation of the estimated standard errors. He suggests
> reducing the sample size by randomly drawing a smaller sample from
> the original sample.
>
> My questions are: 1) Can sample sizes be "too large" leading to too
> small standard errors? 2) Do anybody have a reference to papers
> discussing this issue? 3) Could it be related to possible
>isspecification problems of the model?

This is not a case that the standard error is too small; it tells you
exactly what it should: the degree to which you are uncertain about the
estimate, where the source of uncertainty is the fact that the estimate
is based on a random sample from the population. Large samples means
that you are surer. However, there are two issues here:

1) With very large datasets is that the null hypothesis gets rejected
even if the difference is trivially small, i.e. you are sure that the
difference with the null hypothesis is not due to sampling error, but
it is so small that it is not substantively interesting.

2) Once the sample gets very large other sources of uncertainty besides
random sampling may become relatively more important, but including
these other sources of uncertainty requires quite a big shift in
paradigm: from frequentist (the `normal' statistics you were tought and
virtualy all statistics in Stata) to Bayesian (virtually absent from
Stata). What your co-author sugest is to inflate the uncertainty due to
random sampling error to `swamp' away any other uncertainty. This does
not seem like a good approach to me. 

A nice reference is (Raftery 1995).

Hope this helps,
Maarten

Raftery, Adrian E. 1995. �Bayesian Model Selection in Social Research.�
Sociological Methodology, vol. 25, p.p.111-163.




-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index