Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Nick Cox" <n.j.cox@durham.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Difference of means and t-test |
Date | Tue, 15 Jun 2010 17:59:27 +0100 |
As before, I see no difference in view here, despite what you say. P-values based on imputed Gaussians will be correct only insofar as the underlying distribution really was Gaussian, and otherwise dubious. Naturally, that is impossible to check without the original data, which in this circumstance we do not have. But often we have experience with similar data, a point often overlooked. As Bertrand Russell said somewhere, the method of "postulating" what we want has many advantages; they are the same as the advantages of theft over honest toil. Sure, the test statistics will be the same, but not their interpretation. Nick n.j.cox@durham.ac.uk Richard Williams At 02:25 PM 6/14/2010, Nick Cox wrote: >I don't think our views are contradictory. It is clearly true that you >can get results from summary statistics alone. But erecting fake >Gaussians with those summaries is not equivalent to reconstructing the >original data. That is my point, and no more. It is akin to arguments at >a higher level about "sufficient statistics". If something is normal, >then it is sufficient to know mean and sd, but there isn't a reverse >argument. > >At 11:19 AM 6/14/2010, Nick Cox wrote: > >-- except that will surely overstate the strength of the conclusions, >in > >so far as the real distributions are unlikely to be exactly Gaussian. Still, it is incorrect to say that constructing fake Gaussians "will surely overstate the strength of the conclusions." The p values are based on various assumptions, e.g. normally distributed, homoskedastic errors. If the assumptions are wrong, the p values are wrong. But, whether the assumptions are correct or not, the calculation of the test statistics and coefficients are the same, i.e. for regression-type problems if you've got the means, correlations and standard deviations there are all sorts of things you can compute without having the rest of the data. You run a regression or Anova with the "fake" data and you'll get the exact same results as with the real data. Of course, without having the original data, you can't, say, do diagnostic tests of assumptions, analyze subsets of the data, add an x^2 term, etc. So, yes, you greatly prefer having the real data! But if the real data aren't available there is still a lot you can do. I don't know why the original poster was using ttesti instead of ttest, but if it was because he only had summary statistics available to him then it would be possible for him to run an Anova the way I suggested and the numbers he would get would be the same as if he had the real data. There probably wouldn't be a whole lot else he could do though, e.g. the predict command and most other post-estimation commands won't be of much use without the real data. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/