Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: restricting margins to significant variables only
From
Richard Williams <[email protected]>
To
[email protected], [email protected]
Subject
RE: st: restricting margins to significant variables only
Date
Fri, 18 Mar 2011 15:32:15 -0500
At 12:21 PM 3/18/2011, Maarten buis wrote:
--- On Fri, 18/3/11, Richard Williams wrote:
> I agree, and one of the things that has always troubled me
> is the view that diagnostic tests (and resulting model
> modifications) are good while stepwise regression is bad.
Doing model diagnostics and testing hypotheses require a
different logic. Model diagnostics is all about a trade-off
between making the model simple enough so we understand the
results and complicated enough so that is close enough to
reality. Statistical tests is all about the trade-off between
probability rejecting a hypothesis when we should not and
rejecting a hypothesis when we should. So, by performing a
test at the model diagnostic stage one is applying an
inappropriate logic for that decision.
Suppose, however, that based on diagnostic tests/visual inspections
we decide to add or transform variables in the model, e.g. we add
X^2, use ln(X) instead of X, or include an interaction term for
gender*income. These added/transformed variables have a pretty good
chance of being statistically significant, even though we may just be
capitalizing on chance features of the data. That doesn't mean we
shouldn't do diagnostic tests/visual inspections, but we should
realize that the P values for the final model may be deceptively good
and the results make us look more like "geniuses" than we really are.
Ergo, I agree with Nick when he says "those who want to select their
models in the light of the data and then write them up keeping
exactly the same view of P-values as if the model published is
exactly the model first thought of are playing a rather strange
game." (Even though I may have played that game myself!)
If you are going to use stepwise selection, it is sometimes suggested
that you should use more stringent P values or verify the results on
a 2nd data set. The same advice might be good when diagnostic
tests/visual inspections have significantly influenced the form of
the final model.
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/