I'd go further. Using P-values in this way places
extraordinary trust in their accuracy. Mixing that
trust with reliance on the common but still
arbitrary threshold of 0.05 is bizarre.
Still, it's your analysis, not mine.
Now if I understand you correctly, you have 20
predictors and want to run all possible regressions.
Each predictor is in or out, so that's 2^20 ~ 10^6
regressions by my count. The count is reduced by 1
if you exclude the regression with no predictors, but
that's not much of a help. A million or so regressions
does give you motivation for cutting it down to a mere
50,000 at a minimum. (Much more if most of your models
are not too bad.)
So you won't be able to
fit all the predictor names in a local macro. You
will have to cycle over the integers from 1 up
and construct your predictor list on the fly.
There is code on SSC in the -selectvars- package
that may help, but it was never intended for a problem
this big.
Nick
[email protected]
Maarten Buis
> Standard significance levels no longer hold if you perform many
> tests. If you do enough tests you are sure to find something
> even if nothing is there. Ways to deal with this is to apply a
> Bonferroni correction, you divide the significance level by the
> number of tests. Say you do 1000 tests than your significance
> level is .05/1000=.00005. Other slightly less conservative
> corrections exist. have a look at -help _mtest-.
Ulrich Kohler
> I would propose to display or collect the coefficients in a loop.
>
> Here is a starter for the display approach:
>
> ---------------------
> forvalues i=1/1000 {
> quietly xtreg depvar indep1 indep2 indep3 indep4 if xyz == `i'
> display _b[indep4] " " _se[indep4] " "
> _b[indep4]/_se[indep4] <= 1.96
> }
> -----------------------
>
> This displays the coefficient of independent variable 4,
> along with its
> standard error and a 0 or 1 depending on whether the coefficient is
> significant or not.
>
> Here is a starter for the collect approach:
>
> -----------------
> postfile coefs i b se t using results
> forvalues i=1/1000 {
> quietly xtreg depvar indep1 indep2 indep3 indep4 if xyz == `i'
> post coefs (`i') (_b[indep4]) (_se[indep])
> (_b[indep4]/_se[indep4])
> }
> postclose coefs
> -----------------
>
> This saves the coefficients in questions, their standard
> errors and their
> t-value in the Stata file "results.dta". You can open this
> file with -use
> results, clear- and do what ever you like.
>
> The major problem will be to set up the loop for the 1000
> regressions. We need
> more information to help you with that. Please do carefully
> read the help on
> -foreach- and -forvalues-. In addition you might profit from
> reading "Cox,
> N.J. Speaking Stata: How to face lists with fortitude. The
> Stata Journal ,
> 2002, 2, 202-222 ". There is also a section on -foreach- and
> -forvalues- in
> the book "Data Analysis Using Stata", written by Frauke
> Kreuter and myself,
> which is available at the Stata bookstore.
Ilker Kaya
> > > I am currently running thousands of fixed effect
> regressions with panel
> > > data time series but the independent variable I am
> interested in is same
> > > across all regressions. Right now I am manually reviewing
> all results
> > > and trying to get the significant ones at alpha= 0.05 level. This
> > > process is taking a lot of time so is there any way to
> report only the
> > > results that satisfy this significance level only for
> that variable? For
> > > example I have 20 independent variables (1 to 20) and I
> am running all
> > > possible combinations but I only care about variable 4.
> How can I only
> > > see the results that variable 4 are significant in? I
> would really
> > > appreciate any help on this problem. Thanks in advance....
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/