Thanks to Kit Baum, an -allpossible- module
is now available on SSC.
Setting aside inappropriate and hubristic
overtones of omnicompetence, the creation of
this beast was driven by a particular need.
An outside critic of a current project
urged the merits of trying all possible
subsets of predictors, given a response variable measured on
the ground and reflectances measured
by satellite in several spectral bands.
We have many reservations about shotgun-assisted
model building, but we need the evidence
before we can discuss.
That aside, -allpossible- is best understood
by example, but not our data which you don't have:
. use auto, clear
(1978 Automobile Data)
. gen gpm = 1 / mpg
. allpossible reg gpm head-displ, s(r2_a rmse)
----------------------------------------------------
model | predictors r2_a rmse
----------+-----------------------------------------
1 | (none) 0.000 0.013
2 | 1 0.170 0.012
3 | 2 0.392 0.010
4 | 3 0.726 0.007
5 | 4 0.667 0.007
6 | 5 0.561 0.008
7 | 6 0.589 0.008
8 | 1 2 0.383 0.010
9 | 1 3 0.723 0.007
10 | 1 4 0.663 0.007
11 | 1 5 0.569 0.008
12 | 1 6 0.588 0.008
13 | 2 3 0.729 0.007
14 | 2 4 0.666 0.007
15 | 2 5 0.607 0.008
16 | 2 6 0.627 0.008
17 | 3 4 0.724 0.007
18 | 3 5 0.724 0.007
19 | 3 6 0.723 0.007
20 | 4 5 0.671 0.007
21 | 4 6 0.688 0.007
22 | 5 6 0.645 0.008
23 | 1 2 3 0.726 0.007
24 | 1 2 4 0.661 0.007
25 | 1 2 5 0.602 0.008
26 | 1 2 6 0.624 0.008
27 | 1 3 4 0.720 0.007
28 | 1 3 5 0.720 0.007
29 | 1 3 6 0.719 0.007
30 | 1 4 5 0.666 0.007
31 | 1 4 6 0.684 0.007
32 | 1 5 6 0.642 0.008
33 | 2 3 4 0.725 0.007
34 | 2 3 5 0.726 0.007
35 | 2 3 6 0.725 0.007
36 | 2 4 5 0.670 0.007
37 | 2 4 6 0.687 0.007
38 | 2 5 6 0.663 0.007
39 | 3 4 5 0.721 0.007
40 | 3 4 6 0.720 0.007
41 | 3 5 6 0.720 0.007
42 | 4 5 6 0.687 0.007
43 | 1 2 3 4 0.722 0.007
44 | 1 2 3 5 0.723 0.007
45 | 1 2 3 6 0.722 0.007
46 | 1 2 4 5 0.666 0.007
47 | 1 2 4 6 0.684 0.007
48 | 1 2 5 6 0.659 0.007
49 | 1 3 4 5 0.717 0.007
50 | 1 3 4 6 0.716 0.007
51 | 1 3 5 6 0.716 0.007
52 | 1 4 5 6 0.683 0.007
53 | 2 3 4 5 0.722 0.007
54 | 2 3 4 6 0.721 0.007
55 | 2 3 5 6 0.722 0.007
56 | 2 4 5 6 0.686 0.007
57 | 3 4 5 6 0.717 0.007
58 | 1 2 3 4 5 0.719 0.007
59 | 1 2 3 4 6 0.718 0.007
60 | 1 2 3 5 6 0.719 0.007
61 | 1 2 4 5 6 0.683 0.007
62 | 1 3 4 5 6 0.713 0.007
63 | 2 3 4 5 6 0.718 0.007
64 | 1 2 3 4 5 6 0.715 0.007
----------------------------------------------------
1 headroom
2 trunk
3 weight
4 length
5 turn
6 displacement
More generally, -allpossible- by default (1) computes all
possible models fitted by a model command to a response
and subsets of up to 6 predictors and (2) tabulates a list
of statistics for each model fitted.
Alternatively, (1') the maximum number of predictors fitted may be
specified as a number less than 6. The model command must be a
command fitting a model to a single response variable.
In the example above, it is -regress-; in our project,
it is -glm-.
The list of statistics must include one or more names of
e-class results, as would be displayed by -estimates list-
after fitting an individual model.
Naturally, this command does not purport
to replace the detailed scrutiny of individual models or to offer
an unproblematic way of finding "best" models. Its main use may lie
in demonstrating that several models exist within many projects
possessing roughly equal merit as measured by omnibus statistics.
In fact, I can see this featuring in my own teaching
together with suitable homilies and injunctions.
The magic number 6 does not reflect any principle; it is
as far as I got given that we have 6 spectral bands in
our specific satellite data. Having been brought up
on the idea that with seven parameters you can fit
an elephant, I have some inhibitions about going
further. In any case, looking at all 2^7 = 128
fits with 7 predictors creates a longer table
than might be wished. Let me stress that
the restriction of 6 is to how many predictors
are included in any one model; you can
specify more candidate predictors if you like,
so long as the total number of models fitted
does not exceed the number of observations.
Stata 7 required.
In searching for earlier work in this direction,
I was able to draw upon ideas in the -rsquare-
program of Philip Ender and Rie von Eyben of UCLA,
which has different but overlapping aims. It
saved me a lot of time. Phil tells me there is
something similar in SAS.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/