In addition to the very serious methodological issues quite rightly
raised, I'm wondering what kind of performance this would be
expected to
produce. If I throw 200 noise predictors at a response, I expect to
get
a pretty good R-square, for example. (They shouldn't create much
difficulty over multicollinearity....) Conversely, if 200 sensible
predictors aren't enough, why should one expect 200 more to do much
better?
Calibrating any procedure against what happens with stochastic garbage
would seem essential.
P.S. anyone contemplating stepwise who hasn't read Frank Harrell,
Regression modeling strategies, Springer, New York 2001, should
seek it
out straight away.
Nick
[email protected]
Maarten buis
Sent: 30 January 2009 16:55
To: [email protected]
Subject: RE: st: AW: estimation of series of OLS regressions based on
t-values from previous regression
--- sdm1 <[email protected]> wrote:
I'm afraid that it will have to be done (considerably) more than
once! If anyone could offer an idea of how to get started with a
program, that would be much appreciated.
You seem to be aware that applying this method on that number of
variables will mean that your results cannot in any way be generalized
outside your data. Just out of curiosity, could you give a bit of
substantive background behind your problem, showing that
generalizability is not of interest in your case? It would be nice to
have a real life example of an exception to the rule that
stepwise/datamining/datasnooping is evil.