The output of -correlate- given a varlist of two or more variables
is a matrix of correlations for every pair of variables in
varlist. How could we produce an equivalent directly for
-spearman-? We need to find out that -spearman- leaves a
correlation behind in r(rho):
. makematrix, from(r(rho)) : spearman head trunk length
displacement weight
headroom trunk length displacement
weight
headroom 1
trunk .67678924 1
length .53235996 .71907323 1
displacement .47845891 .57664675 .85248218 1
weight .52808385 .65644851 .94895697 .90538822
1
The result is displayed using -matrix list- and we will normally
want to tidy up the presentation, say by
. makematrix, from(r(rho)) format(%4.3f) : spearman head trunk
length displacement weight
headroom trunk length displacement
weight
headroom 1.000
trunk 0.677 1.000
length 0.532 0.719 1.000
displacement 0.478 0.577 0.852 1.000
weight 0.528 0.656 0.949 0.905
1.000
However, let us leave these details of presentation on one side.
In this case, given a bivariate command, and a varlist, and a
single result from which to compile the matrix, -makematrix- takes
each pair of variables from varlist, runs a bivariate command for
that pair, and puts a single result in the cell defined by each
pair of variables. So both rows and columns are specified by
varlist.
Alternatively, we might want different sets of variables on the
rows and the columns, perhaps specifying a submatrix of the full
matrix. The option -cols()- can be used to specify variables to
appear as columns. Say we did a principal component analysis of
five variables and followed with calculation of scores:
. pca head trunk length displacement weight
. score score1-score5
. makematrix, from(r(rho)) cols(score?) : correlate head trunk
length displacement weight
score1 score2 score3 score4
score5
headroom .69579216 .65541006 .28995191 -.04724258
.00263525
trunk .84053038 .3144061 -.42608327 .11382425
.01243294
length
.94323831 -.20350815 -.05828833 -.22445161 -.12292224
displacement .89424409 -.29085394 .19339097
.27602318 -.04628849
weight .93915804 -.28562389 .0409204 -.10426623
.15445146
Here the full correlation matrix of variables and scores, as would
be produced by -correlate-, is 10 X 10, and the submatrix produced
by -makematrix- is only 5 X 5. The default number of decimal
places is clearly ridiculous, and we would normally want to work
on the column headers. The matrix result can be left in memory as
a named matrix, and then further manipulated:
. makematrix R, from(r(rho)) cols(score?) : correlate head trunk
length displacement weight
. matrix colnames R = "score 1" "score 2" "score 3" "score 4"
"score 5"
. matrix li R, format(%4.3f)
R[5,5]
score 1 score 2 score 3 score 4 score 5
headroom 0.696 0.655 0.290 -0.047 0.003
trunk 0.841 0.314 -0.426 0.114 0.012
length 0.943 -0.204 -0.058 -0.224 -0.123
displacement 0.894 -0.291 0.193 0.276 -0.046
weight 0.939 -0.286 0.041 -0.104 0.154
Another application of the -cols()- option is perhaps more
commonly desired:
. makematrix , from(r(rho) r(p)) label cols(price) : spearman
mpg-foreign
rho p
Mileage (mpg) -.55546596 7.272e-07
Repair Record 1978 .10275187 .40082135
Headroom (in) .1174198 .33661622
Trunk space (cu ft) .42395912 .00028325
Weight (lbs) .50135653 .00001143
Length (in) .50145304 .00001138
Turn Circle (ft) .32117803 .00712682
Displacement (cu in) .41612747 .00037625
Gear Ratio -.3053873 .01072089
Car type .08065421 .51002468
. makematrix , from(r(rho) r(p)) list label format(%4.3f %6.5f) sep(0)
cols(price) : spearman mpg-foreign
+-------------------------------------------+
| rho p |
|-------------------------------------------|
| Mileage (mpg) -0.555 0.00000 |
| Repair Record 1978 0.103 0.40082 |
| Headroom (in.) 0.117 0.33662 |
| Trunk space (cu. ft.) 0.424 0.00028 |
| Weight (lbs.) 0.501 0.00001 |
| Length (in.) 0.501 0.00001 |
| Turn Circle (ft.) 0.321 0.00713 |
| Displacement (cu. in.) 0.416 0.00038 |
| Gear Ratio -0.305 0.01072 |
| Car type 0.081 0.51002 |
+-------------------------------------------+
As this example shows, we can also ask for the results to be shown
using the -list- command, which opens a wider range of
presentation possibilities. The -label- option asks for variable
labels to be shown, and the numeric variables can be assigned
display formats.
As this example also shows, we can show two or more scalar results
from each command run. This is possible in various ways. A
univariate command can be repeated, each time yielding two or more
scalars:
. makematrix, from(r(mean) r(sd) r(skewness)) : su head trunk
length displacement weight, detail
mean sd skewness
headroom 2.9932432 .84599477 .14086508
trunk 13.756757 4.2774042 .02920342
length 187.93243 22.26634 -.04097455
displacement 197.2973 91.837219 .59165653
weight 3019.4595 777.19357 .14811637
. makematrix, from(r(mean) r(sd) r(skewness)) list format(%2.1f
%2.1f %4.3f) sep(0) : su head trunk length displacement weight,
detail
+------------------------------------------+
| mean sd skewness |
|------------------------------------------|
| headroom 3.0 0.8 0.141 |
| trunk 13.8 4.3 0.029 |
| length 187.9 22.3 -0.041 |
| displacement 197.3 91.8 0.592 |
| weight 3019.5 777.2 0.148 |
+------------------------------------------+
-makematrix- reasons in this way: The user wants three scalars,
which I will show in three columns. So I must run the command
specified in turn on each variable supplied, which I will show on
the rows. So for each variable in varlist, -makematrix- runs a
univariate command, and puts two or more scalars in the cells of
each row.
A bivariate command can be repeated, each time yielding two or
more scalars:
. makematrix, from(r(rho) r(p)) lhs(rep78-foreign) : spearman mpg
rho p
rep78 .30982668 .00957855
headroom -.48660171 .00001103
trunk -.64977398 3.759e-10
weight -.85755073 1.778e-22
length -.8314402 4.710e-20
turn -.75767499 5.548e-15
displacement -.77126724 9.009e-16
gear_ratio .60982891 8.061e-09
foreign .36289624 .00148459
-makematrix- reasons in this way: The user wants two scalars,
which I will show in two columns. So I must run the command
specified in turn on the variable supplied. The option -lhs()- is
also specified, so that must be used to supply the other variable.
Whenever -lhs()- is specified, it specifies the rows of the
matrix. That is, in this case, the rows show the results of
-spearman rep78 mpg ... spearman foreign mpg-. Notice how the
variables specified in -lhs()- appear on the left-hand side of the
varlist which -spearman- runs. (-lhs()- also names the left-hand
side of the matrix, but that is a happy accident.) This is also
allowed:
. makematrix, from(r(rho) r(p)) rhs(rep78-foreign) : spearman mpg
In this case, the rows show the results of -spearman mpg rep78 ...
spearman mpg foreign-, and are exactly the same as in the previous
example. Again, whenever -rhs()- is specified, it specifies the
rows of the matrix. Notice how the variables specified in -rhs()-
appear on the right-hand side of the varlist which spearman runs.
(By a small stretch, you can also think of it as naming the
right-hand side of the matrix, given that we could repeat the row
names on that side.) In other cases, which is used may well
matter:
. makematrix, from(e(r2) e(rmse) _b[_cons] _b[mpg]) lhs(rep78-foreign)
list dp(3 2 2 3) abb(9) sep(0) divider : regress mpg
+------------------------------------------------------+
| | r2 | rmse | _b[_cons] | _b[mpg] |
|--------------+-------+--------+-----------+----------|
| rep78 | 0.162 | 0.91 | 1.96 | 0.068 |
| headroom | 0.171 | 0.78 | 4.28 | -0.061 |
| trunk | 0.338 | 3.50 | 22.91 | -0.430 |
| weight | 0.652 | 461.96 | 5328.76 | -108.432 |
| length | 0.633 | 13.58 | 253.16 | -3.063 |
| turn | 0.517 | 3.08 | 51.30 | -0.547 |
| displacement | 0.498 | 65.52 | 435.85 | -11.201 |
| gear_ratio | 0.380 | 0.36 | 1.98 | 0.049 |
| foreign | 0.155 | 0.43 | -0.37 | 0.031 |
+------------------------------------------------------+
. makematrix, from(e(r2) e(rmse) _b[_cons] _b) rhs(rep78-foreign)
list dp(3 2 2 3) abb(9) sep(0) divider : regress mpg
+--------------------------------------------------+
| | r2 | rmse | _b[_cons] | _b |
|--------------+-------+------+-----------+--------|
| rep78 | 0.162 | 5.41 | 13.17 | 2.384 |
| headroom | 0.171 | 5.30 | 29.77 | -2.830 |
| trunk | 0.338 | 4.74 | 32.12 | -0.787 |
| weight | 0.652 | 3.44 | 39.44 | -0.006 |
| length | 0.633 | 3.53 | 60.16 | -0.207 |
| turn | 0.517 | 4.05 | 58.80 | -0.946 |
| displacement | 0.498 | 4.13 | 30.07 | -0.044 |
| gear_ratio | 0.380 | 4.59 | -2.26 | 7.813 |
| foreign | 0.155 | 5.36 | 19.83 | 4.946 |
+--------------------------------------------------+
The first series of regressions predicts -rep78 ... foreign- in turn
from -mpg-. The second series predicts -mpg- from -rep78 ... foreign-
in turn. The r-square results will be the same, but not the root mean
square errors, or the intercepts or slopes. Note that _b by itself
has
the interpretation of _b[row_variable]. -dp()- is a lazy
alternative to -format()- used to specify the number of decimal
places.
In fact -lhs()- and -rhs()- can be used to produce a series of
multivariate results. Suppose we have -weightsq-, i.e. -weight^2-.
. gen weightsq = weight^2
. makematrix, from(e(r2) e(rmse)) lhs(mpg-trunk length-foreign)
list dp(3 2) sep(0) divider : regress weight weightsq
+------------------------------+
| | r2 | rmse |
|--------------+-------+-------|
| mpg | 0.672 | 3.36 |
| rep78 | 0.222 | 0.89 |
| headroom | 0.236 | 0.75 |
| trunk | 0.457 | 3.20 |
| length | 0.900 | 7.12 |
| turn | 0.736 | 2.29 |
| displacement | 0.826 | 38.90 |
| gear_ratio | 0.577 | 0.30 |
| foreign | 0.379 | 0.37 |
+------------------------------+
This series predicts -mpg ... foreign- in turn from -weight- and
-weightsq-. When either -lhs()- or -rhs()- is specified they
define the varying rows, while the varlist supplied is fixed
for each run of the command.
There is one more nuance to be explained. Say you want a table of
sums for a set of variables. You might try
. makematrix, from(r(sum)): su head trunk length displacement
weight, meanonly
However, -makematrix- cannot distinguish between this and a
similar problem with a bivariate command, so it will attempt to
run -summarize- on all distinct pairs of variables. This will
succeed, except that what is left behind in -r(sum)- will be the
sum of the second of each pair of variables. What you will prefer
is a vector, and that is the option to specify:
. makematrix, from(r(sum)) vector: su head trunk length
displacement weight, meanonly
There's more, for which please see the help as usual.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/