I have a more general question arising obliquely
out of these issues.
The point of these multiple comparison procedures,
Bonferroni, Scheffe, Sidak, etc. (and sprinkle all
the accents required on those names) is, as I
understand it, to inject a strong note of
caution given the number of individual tests
you could carry out and the built-in tendency
that the more you carry out, the more are likely to attain
significance at some conventional level, and so forth.
What is the attitude to fishing _among_ multiple
comparison procedures, i.e. looking _among_ various
different post hoc results with the pitfall that
you're tempted to report the one closest to your
pre-conceived (ne)science?
Aren't you supposed to cleave the one whose
inferential logic you find most compelling?
Is this a documented issue?
If you use a multiple-test procedure, then it should be chosen a priori,
rather than by fishing among multiple comparison procedures to see which
gives the answer you most like. In general, statistical theory assumes that
scientists first decide what they want to measure, and then measure it. The
validity of confidence regions stands or falls by that assumption. And this
is still true if the confidence region is for a non-numeric, set-valued
parameter, such as "the set of null hypotheses that are true".