Thanks to Kit Baum, -tuples- is now available from SSC.
You need Stata 8 or 9.
The writing of -tuples- was provoked by various recent
postings which have touched on cycling through possible
models using different subsets of predictors.
Assume the character of the model is decided, except for
precisely which predictors are used out of a given list.
Given p predictors, each one is in or out, 2
possibilities for each and thus 2^p overall. If you are
not interested in the single case of all of them out,
subtract 1 to get 2^p - 1.
For small p, it can be possible, although possibly not
-gllamm-orous, to look at all of the candidate models.
Note that some people doing this are in search of the
Grail, the "best" model; others are fired by scepticism
and exploring the extent to which quite different models
can appear about equally "good" (or "bad", as the case
may be). (Insert lengthy discussion about the weasel
words in quotation marks.)
Some users have offered their versions of how to do
it, including Paul Millar's -bic- and Alan Feiveson's
-tryem-, which in essence both cycle through models
and emit a digest of results.
In contrast, as I offer a new program -tuples- (which is
itself a modification of -selectvars-, on SSC) I want to
stress how little it does. You have to do all the
modelling yourself! In a way it is more like -levelsof-
than either -tryem- or -bic- (or -allpossible-, another
beast in the same part of the zoo).
In general, you give -tuples- a list of items
and it produces a bundle of local macros in the
caller's space, naming in turn all the possible singletons,
all the possible pairs and so forth. Here I feed it -frog toad newt-
and ask it to show what it is doing:
. tuples frog toad newt, display
tuple1: newt
tuple2: toad
tuple3: frog
tuple4: toad newt
tuple5: frog newt
tuple6: frog toad
tuple7: frog toad newt
Know that -frog toad newt- were not variables
in memory. -tuples- tries a list to see if it
is a varlist, but is happy if it is not. (Conversely
you can insist on the varlist interpretation, or
insist that no interpretation as varlist is allowed.)
For a slightly more serious example, let's us
imagine that our predictors are all car size variables:
. sysuse auto
(1978 Automobile Data)
. tuples head trunk-length displacement, display
tuple1: displacement
tuple2: length
tuple3: weight
tuple4: trunk
tuple5: headroom
tuple6: length displacement
tuple7: weight displacement
tuple8: weight length
tuple9: trunk displacement
tuple10: trunk length
tuple11: trunk weight
tuple12: headroom displacement
tuple13: headroom length
tuple14: headroom weight
tuple15: headroom trunk
tuple16: weight length displacement
tuple17: trunk length displacement
tuple18: trunk weight displacement
tuple19: trunk weight length
tuple20: headroom length displacement
tuple21: headroom weight displacement
tuple22: headroom weight length
tuple23: headroom trunk displacement
tuple24: headroom trunk length
tuple25: headroom trunk weight
tuple26: trunk weight length displacement
tuple27: headroom weight length displacement
tuple28: headroom trunk length displacement
tuple29: headroom trunk weight displacement
tuple30: headroom trunk weight length
tuple31: headroom trunk weight length displacement
The results match standard facts from elementary
combinatorics:
. di comb(5,1) + comb(5,2) + comb(5,3) + comb(5,4) + comb(5,5)
31
. di 2^5 - 1
31
Now, I suggest, you have the main tool you really need, as
everything else is already possible through standard tools.
Suppose you want to store regression R^2 and the predictor list
in two variables.
(1) initialise
gen rsq = .
gen predictors = ""
(2) loop
quietly forval i = 1/31 {
regress mpg `tuple`i''
replace rsq = e(r2) in `i'
replace predictors = "`tuple`i''" in `i'
}
(3) play
Suppose you prefer to put stuff in a file:
(1) initialise
<set up file>
(2) loop
quietly forval i = 1/31 {
regress mpg `tuple`i''
<post to file>
}
(3) play
And so on. You can vary or complicate this arbitrarily,
varying model commands, etc., etc. Note that the local
macro tuple0 is not defined by -tuples-, so you can
exploit its non-existence:
quietly forval i = 0/31 {
regress mpg `tuple`i''
...
}
`tuple0' evaluates to an empty string, so you can get
results for a null model with no predictors if you want.
Just don't try putting results in observation 0.
The algorithm of -tuples- is more inefficient than
it could be because I twist results to be sure that
singletons, pairs, triples, ... are emitted in that
sequence. Somebody might have a smarter way of doing that.
-tuples- might have other applications beyond predictor choice.
If you object to doing powers of 2 in your head, the number
of tuples produced is itself produced in local macro -ntuples-.
Some other wrinkles are discussed in the help file.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/