Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: estimation of series of OLS regressions based on t-values from previous regression


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: AW: estimation of series of OLS regressions based on t-values from previous regression
Date   Fri, 30 Jan 2009 16:56:44 -0500

---



And, of course, throwing 200 variables into a stepwise algorithm presumes that they should enter as single, untransformed, effects. If, on the contrary, there are important interactions or, for continuous predictors, necessary transformations, SW will not find them, because it does not look.

How fortunate that a knowledgeable expert can supply 12 candidate predictors! From a Bayesian perspective, that is good prior information. An additional benefit of starting with a small number of predictors: there is a reasonable chance that the final model(s) will be believable.

-Steve


On Jan 30, 2009, at 2:14 PM, Nick Cox wrote:

In addition to the very serious methodological issues quite rightly
raised, I'm wondering what kind of performance this would be expected to produce. If I throw 200 noise predictors at a response, I expect to get
a pretty good R-square, for example. (They shouldn't create much
difficulty over multicollinearity....) Conversely, if 200 sensible
predictors aren't enough, why should one expect 200 more to do much
better?

Calibrating any procedure against what happens with stochastic garbage
would seem essential.

P.S. anyone contemplating stepwise who hasn't read Frank Harrell,
Regression modeling strategies, Springer, New York 2001, should seek it
out straight away.

Nick
[email protected]

Maarten buis
Sent: 30 January 2009 16:55
To: [email protected]
Subject: RE: st: AW: estimation of series of OLS regressions based on
t-values from previous regression

--- sdm1 <[email protected]> wrote:
I'm afraid that it will have to be done (considerably) more than
once!  If anyone could offer an idea of how to get started with a
program, that would be much appreciated.

You seem to be aware that applying this method on that number of
variables will mean that your results cannot in any way be generalized
outside your data. Just out of curiosity, could you give a bit of
substantive background behind your problem, showing that
generalizability is not of interest in your case? It would be nice to
have a real life example of an exception to the rule that
stepwise/datamining/datasnooping is evil.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index