You will want to switch to False Discovery Rate (FDR) methods of
dealing with the problem of multiple tests. Or at least think about
doing so. Read "Newson R and the ALSPAC Study Team. Multiple-test
procedures and smile plots. The Stata Journal 2003; 3(2): 109-132."
Also see his presentation notes on the same topic, from
<http://www.kcl-phs.org.uk/rogernewson/papers.htm>. The "holland"
option was in there because I wrote that piece of code before Roger
included FDR methods, I think.
-Dave
On Thursday, October 9, 2003, at 01:42 PM, Wallace, John wrote:
Thanks very much, David. I adapted the code you supplied and it worked
amazingly well. I had my 3 sets of t values in 12 seconds. The
Holland
method is shown as an "improved sequentially rejective Bonferroni test
procedure" in the abstract listing from the smileplot help. I'm
familiar
with the Bonferroni correction, but is there a nutshell explanation
for what
the Holland method is doing in establishing a significance threshold?
Thanks also to Nick Winter, Nick Cox, and William Gould for the
original
ttest speed discussion and resulting code. I'm awash in vast seas of
raw
data at the moment, and efficient statistics are becoming more
important
with each passing day (and microarray version).
-----Original Message-----
From: David Airey [mailto:[email protected]]
Sent: Wednesday, October 08, 2003 10:22 PM
To: [email protected]
Cc: Wallace, John
Subject: re: st: Processing speed for ttest
John,
See the threads started by me on
Subject: st: give me some speed
Date: Thu, 4 Apr 2002 14:07:38 -0600
on analysis of Affy chip data by ttests and the many helpful responses.
You have to use the Harvard archives to find this and related threads.
There is a way to make the ttests very, very fast (2 seconds for 12K
ttests), pointed out by Nick Winter, the winner of the speed contest.
But your intuition to use an alternative form of the ttest (regress)
will get you a quicker result too, as William Gould detailed.
clear
set more off
insheet using cxb.txt
/*
f1 f2 f3 f4 f5 f6 c1 c2 c3 c4 c5 c6 <--wide format, forebrain and
cerebellum results for animals 1 to 6.
*/
display "$S_TIME"
forvalues j = 1/6 {
generate d`j' = f`j'-c`j'
}
egen dif_mean = rmean(d1-d6)
egen dif_sd = rsd(d1-d6)
egen dif_n = robs(d1-d6)
generate t = dif_mean/(dif_sd/sqrt(dif_n))
generate tp = tprob(dif_n-1,t)
display "$S_TIME"
save Winter, replace
smileplot, pvalue(tp) estimate(t) method(holland) nhcred(retain)
list t tp if retain == 0
set more on
*this takes 2 seconds for all 12422 ttests or 0.016 seconds per 100
ttests
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/