Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: New versions of -qqvalue- and -smileplot- on SSC
From
"Roger B. Newson" <[email protected]>
To
"[email protected]" <[email protected]>, [email protected]
Subject
st: New versions of -qqvalue- and -smileplot- on SSC
Date
Thu, 11 Oct 2012 15:16:54 +0100
Thanks as always to Kit Baum, new versions of the packages -qqvalue- and
-smileplot- are available for download from SSC. In Stata, use the -ssc-
command to do this, or -adoupdate- if you already have old versions of
-qqvalue- and -smileplot-.
The -qqvalue- and -smileplot- packages are described as below on my
website, and implement selections of frequentist multiple-test
procedures, inputting a variable containing P-values and outputting
q-values and discovery sets, respectively. Most statisticians nowadays
would argue that q-values are more informative than discovery sets.
However, discovery sets have been implemented in -smileplot- for a few
rarely-used multiple-test procedures, for which q-values are not
available in -qqvalue-.
The new version of -qqvalue- fixes a problem with the Sidak and
Holland-Copenhaver procedures, which caused them to output zero q-values
for P-values that were so small that, when subtracted from 1 in double
precision, they gave a result of 1. This has been fixed by using the
Bonferroni procedure as a substitute for the Sidak procedure, and the
Holm procedure as a substitute for the Holland-Copenhaver procedure, to
compute q-values for such tiny P-values. This procedure works because in
the limit, as the input P-value tends to zero, the output Sidak q-value
converges in ratio to the output Bonferroni q-value, and the output
Holland-Copenhaver q-value converges in ratio to the output Holm
q-value. I would like to thank Tiago Pereira for drawing our attention
to this issue of tiny P-values on Statalist. See
http://www.stata.com/statalist/archive/2012-03/msg00726.html
for more about this correspondence.
The new version of -smileplot- "fixes" a similar problem with the Sidak
and Holland-Copenhaver procedures when calculating critical P-values for
generating discovery sets. In this case, the problem is that, for an
input P-value p and a number m of multiple comparisons, the quantity
(1-p)^(1/m)
can sometimes be computed in double precision to give a result of 1,
either because p is tiny or because m is very large. I have again
substituted the Bonferroni and Holm formulas for the Sidak and
Holland-Copenhaver formulas for these cases. If the problem is a tiny
input P-value, then this should be a satisfactory solution, because the
convergence in ratio still applies. However, if the problem is a huge m
without a tiny p, then this solution will produce a conservative
corrected critical P-value, as the convergence in ratio does not apply
as m tends to infinity, in the way that it does as p tends to zero. On
the other hand, the corrected critical P-value will be less than the
value of zero that -smileplot- previously produced in this case. This
issue seems to me to be one more reason for preferring q-values to
discovery sets.
I am considering submitting a brief Stata Journal article on this
precision issue with the Sidak and Holland-Copenhaver procedures, which
is potentially a trap for unsuspecting genome scanners.
Best wishes
Roger
-----------------------------------------------------------------------------------
package qqvalue from http://www.imperial.ac.uk/nhli/r.newson/stata10
-----------------------------------------------------------------------------------
TITLE
qqvalue: Generate frequentist q-values by inverting multiple-test
procedures
DESCRIPTION/AUTHOR(S)
qqvalue is similar to the R package p.adjust. It inputs a single
variable, assumed to contain P-values calculated for multiple
comparisons, in a dataset with 1 observation per comparison. It
outputs a new variable, containing the q-values corresponding to
these P-values, calculated by inverting a multiple-test procedure
specified by the user. These q-values represent, for each
corresponding P-value, the minimum uncorrected P-value threshold
for which that P-value would be in the discovery set, assuming that
the specified multiple-test procedure was used on the same set of
input P-values to generate a corrected P-value threshold. These
minimum uncorrected P-value thresholds may represent familywise
error rates or false discovery rates, depending on the procedure
used. Optionally, qqvalue may output other variables, containing
the various intermediate results used in calculating the
q-values. The multiple-test procedures available for
qqvalue are a subset of those available using the multproc module
of the smileplot package, which can be downloaded from SSC.
Author: Roger Newson
Distribution-Date: 08October2012
Stata-Version: 10
INSTALLATION FILES (click here to install)
qqvalue.ado
qqvalue.sthlp
-----------------------------------------------------------------------------------
(click here to return to the previous screen)
-----------------------------------------------------------------------------------
package smileplot from http://www.imperial.ac.uk/nhli/r.newson/stata10
-----------------------------------------------------------------------------------
TITLE
smileplot: Multiple test procedures and smile plots
DESCRIPTION/AUTHOR(S)
This package contains the programs multproc, smileplot and
smileplot7.
multproc inputs a data set with 1 observation for each of a set
of multiple
significance tests and data on the P-values, and carries out a
multiple test
procedure chosen by the user to define a corrected overall
critical P-value
for accepting or rejecting the null hypotheses tested. These
procedures
may be one-step, step-up or step-down, and may control the
familywise error
rate (eg the Bonferroni, Sidak, Holm, Holland-Copenhaver,
Hochberg and Rom
procedures) or the false discovery rate (eg the Simes, Benjamini-Liu,
Benjamini-Yekutieli and Benjamini-Krieger-Yekutieli procedures).
smileplot,
and its Stata 7 version smileplot7, work by calling multproc and then
creating a smile plot, with data points corresponding to multiple
estimated
parameters, the P-values (on a reverse log scale) on the Y-axis,
and the
corresponding parameter estimates (or another variable) on the
X-axis. There
are Y-axis reference lines at the uncorrected and corrected
overall critical
P-values. The reference line at the corrected critical P-value,
known as the
parapet line, is interpreted informally as a boundary between
data mining and
data dredging. multproc, smileplot and smileplot7 are used on
data sets with
one observation per estimated parameter and data on estimates and
their
P-values, which may be created using parmby, parmest, statsby or
postfile.
Author: Roger Newson
Distribution-Date: 09october2012
Stata-Version: 10
INSTALLATION FILES (click here to install)
multproc.ado
multproc.sthlp
smileplot.ado
smileplot.sthlp
smileplot7.ado
smileplot7.sthlp
-----------------------------------------------------------------------------------
(click here to return to the previous screen)
--
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/