Roger Harbord's posting on -matvsort- had
me scuttling to look and see what on Earth
it did. That reminded indirectly of a utility
I have long wanted, on and off, but have never
seen or got round to writing: a program to
sort the rows and/or columns of a matrix
according to some summary of the elements
in the rows and/or columns.
Here's a very simple example. Suppose
we look at the car size variables in
the auto data:
. sysuse auto
(1978 Automobile Data)
. corr head trunk weight length displacement
(obs=74)
| headroom trunk weight length displa~t
-------------+---------------------------------------------
headroom | 1.0000
trunk | 0.6620 1.0000
weight | 0.4835 0.6722 1.0000
length | 0.5163 0.7266 0.9460 1.0000
displacement | 0.4745 0.6086 0.8949 0.8351 1.0000
For display, we might want to reorder that matrix, for
example to get clusters of high correlations and low
correlations together, as far as possible. (We might
also want fewer than 4 d.p.) The first could be
achieved by detailed inspection and re-typing the variable
names in different order, but an automated solution is
also desirable, especially for much bigger problems.
One first step is to get the correlations into a matrix
in the sense of Stata's -matrix- commands.
There are several ways to do that. One is -matcorr- from
STB-56:
. matcorr head trunk weight length displacement , matrix(corr)
(obs=74)
< same matrix, naturally >
Then -matsusort- (now added to the -matvsort- package on
SSC, thanks to Kit Baum) sorts the rows according to
their means. That is,
for each row {
calculate the mean of the row elements
}
sort the rows according to the order of their means
The -decrease- option controls which way
they are sorted, and the the -columns- option
does it by columns.
. matsusort corr scorr, dec
. matsusort scorr scorr, col dec
We now have more control e.g. over format from -matrix list-:
. mat li scorr , format(%9.3f)
symmetric scorr[5,5]
length weight displacement trunk headroom
length 1.000
weight 0.946 1.000
displacement 0.835 0.895 1.000
trunk 0.727 0.672 0.609 1.000
headroom 0.516 0.483 0.474 0.662 1.000
. mat li scorr , format(%9.3f) nohalf
symmetric scorr[5,5]
length weight displacement trunk headroom
length 1.000 0.946 0.835 0.727 0.516
weight 0.946 1.000 0.895 0.672 0.483
displacement 0.835 0.895 1.000 0.609 0.474
trunk 0.727 0.672 0.609 1.000 0.662
headroom 0.516 0.483 0.474 0.662 1.000
The sorting by means is just the default. There is a handle allowing you
to sort according to _any_ summary measure produced by -summarize-. It's
unlikely that anyone would choose to sort by kurtosis, but the generality
is cheap.
You can this bundled with the other stuff previously in -matvsort- by
. ssc inst matvsort
or
. ssc inst matvsort, replace
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/