In the closing post to the thread st: Comparing change in rates -
frustrating problem: questionable results, Ricardo Ovaldia asked:
>One last thing, if the interaction term is not
>significant, does it still need to be included in the
>model?
Does anyone on the list have a reference to cite that provides guidance on
this matter? My understanding is that there is disagreement among the
experts.
It might be helpful to distinguish circumstances (for example,
model-building and hypothesis-testing) in which the question could arise.
There could legitimately be different rules for each.
On one hand, in a model-building exercise, terms are deleted in the interest
of parsimony and generalizability of the final statistical model of the data
or phenomenon. In this circumstance, terms could be deleted from the model
in accordance with a statistical criterion--for example a p-value
threshold--or a set of statistical and nonstatistical criteria.
On the other hand, in a hypothesis-testing setting, the statistical model is
constructed on the basis of content prior to having the data in-hand. To
delete a term here would be to change nature of the hypothesis tested by the
study.
In practice, the difference behind this simple-minded distinction might not
be clear-cut. I have seen statisticians eschew dropping nonsignificant
terms from multiple regression models, invoking arguments akin to those
against stepwise regression, but then advocate the use of sequential (SAS
Type I/II) sums of squares in factorial ANOVA with interaction.
Joseph Coveney
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/