Thanks to Kit Baum, a program -moments- is now
available in a package of the same name from
SSC. Stata 8.2 is required.
-moments- has been mentioned a couple of times
in recent postings to Statalist. The point was
made that if you are doing something like -sktest-,
you should also look at the skewness and kurtosis
(a graph too, naturally).
-moments- calculates number of observations, mean,
standard deviation, skewness and kurtosis for a list
of variables.
Your reaction to that is likely to be one or both of
two things:
(1) Surely -summarize- does that already.
(2) Surely -tabstat- is already available for customised
tables of summary statistics.
If you thought that, you are correct. The merits
of -moments- are purely matters of convenience or presentation.
-summarize- produces these measures, but together with
a lot of other stuff:
. su price, detail
Price
-------------------------------------------------------------
Percentiles Smallest
1% 3291 3291
5% 3748 3299
10% 3895 3667 Obs 74
25% 4195 3748 Sum of Wgt. 74
50% 5006.5 Mean 6165.257
Largest Std. Dev. 2949.496
75% 6342 13466
90% 11385 13594 Variance 8699526
95% 13466 14500 Skewness 1.653434
99% 15906 15906 Kurtosis 4.819188
-tabstat- is the obvious answer to that problem.
. tabstat price-foreign, c(s) s(n mean sd skew kurt)
variable | N mean sd skewness kurtosis
-------------+--------------------------------------------------
price | 74 6165.257 2949.496 1.653434 4.819188
mpg | 74 21.2973 5.785503 .9487176 3.975005
rep78 | 69 3.405797 .9899323 -.0570331 2.678086
headroom | 74 2.993243 .8459948 .1408651 2.208453
trunk | 74 13.75676 4.277404 .0292034 2.192052
weight | 74 3019.459 777.1936 .1481164 2.118403
length | 74 187.9324 22.26634 -.0409746 2.04156
turn | 74 39.64865 4.399354 .1238259 2.229458
displacement | 74 197.2973 91.83722 .5916565 2.375577
gear_ratio | 74 3.014865 .4562871 .2191658 2.101812
foreign | 74 .2972973 .4601885 .8869686 1.786713
----------------------------------------------------------------
When I see a table like that, I want fewer decimal places. I
tend to go for 3, and on some criteria that is way too many:
. tabstat price-foreign, c(s) s(n mean sd skew kurt) format(%4.3f)
variable | N mean sd skewness kurtosis
-------------+--------------------------------------------------
price | 74.000 6165.257 2949.496 1.653 4.819
mpg | 74.000 21.297 5.786 0.949 3.975
rep78 | 69.000 3.406 0.990 -0.057 2.678
headroom | 74.000 2.993 0.846 0.141 2.208
trunk | 74.000 13.757 4.277 0.029 2.192
weight | 74.000 3019.459 777.194 0.148 2.118
length | 74.000 187.932 22.266 -0.041 2.042
turn | 74.000 39.649 4.399 0.124 2.229
displacement | 74.000 197.297 91.837 0.592 2.376
gear_ratio | 74.000 3.015 0.456 0.219 2.102
foreign | 74.000 0.297 0.460 0.887 1.787
----------------------------------------------------------------
That is clearly better, but some small details are irritating.
1. If I use a non-default -format()-, I get it everywhere. (My
punishment is that I got what I asked for.) In the case of
number of observations, this looks a little silly. As -tabstat-
accepts at most frequency or analytical weights, that column N
is always going to contain integers. I've previously suggested
that -tabstat- be modified to ignore -format()- in the case of N,
but to no effect.
2. That's the only control over small details of
presentation that you get. (You can transpose the table, which
is on occasion very useful.)
The default output of -moments- is like this:
. moments
-----------------------------------------------------------------------
n = 69 | mean SD skewness kurtosis
-----------------------+-----------------------------------------------
Price | 6146.043 2912.440 1.688 5.032
Mileage (mpg) | 21.290 5.866 0.995 3.997
Repair Record 1978 | 3.406 0.990 -0.057 2.678
Headroom (in.) | 3.000 0.853 0.197 2.144
Trunk space (cu. ft.) | 13.928 4.343 -0.044 2.159
Weight (lbs.) | 3032.029 792.851 0.118 2.073
Length (in.) | 188.290 22.747 -0.076 2.000
Turn Circle (ft.) | 39.797 4.441 0.071 2.228
Displacement (cu. in.) | 198.000 93.148 0.581 2.354
Gear Ratio | 2.999 0.463 0.279 2.109
Car type | 0.304 0.464 0.850 1.723
-----------------------------------------------------------------------
The default is now %9.3f. Well, I like that.
Also, by default casewise deletion is used: statistics are computed for
the sample that is not missing for any of the variables. The constant
n = 69 can thus be tucked away in a corner. That's the other way
round from -summarize- or -tabstat-. Naturally, you can get the opposite
behaviour if you wish:
. moments, allobs
-----------------------------------------------------------------------------------
Variable | n mean SD skewness kurtosis
-----------------------+-----------------------------------------------------------
Price | 74 6165.257 2949.496 1.653 4.819
Mileage (mpg) | 74 21.297 5.786 0.949 3.975
Repair Record 1978 | 69 3.406 0.990 -0.057 2.678
Headroom (in.) | 74 2.993 0.846 0.141 2.208
Trunk space (cu. ft.) | 74 13.757 4.277 0.029 2.192
Weight (lbs.) | 74 3019.459 777.194 0.148 2.118
Length (in.) | 74 187.932 22.266 -0.041 2.042
Turn Circle (ft.) | 74 39.649 4.399 0.124 2.229
Displacement (cu. in.) | 74 197.297 91.837 0.592 2.376
Gear Ratio | 74 3.015 0.456 0.219 2.102
Car type | 74 0.297 0.460 0.887 1.787
-----------------------------------------------------------------------------------
The number of observations remains shown as an integer. You can specify up
to four numeric formats, to control display of
mean (standard deviation (skewness (kurtosis))).
. moments, format(%2.1f %2.1f)
-----------------------------------------------------------------------
n = 69 | mean SD skewness kurtosis
-----------------------+-----------------------------------------------
Price | 6146.0 2912.4 1.688 5.032
Mileage (mpg) | 21.3 5.9 0.995 3.997
Repair Record 1978 | 3.4 1.0 -0.057 2.678
Headroom (in.) | 3.0 0.9 0.197 2.144
Trunk space (cu. ft.) | 13.9 4.3 -0.044 2.159
Weight (lbs.) | 3032.0 792.9 0.118 2.073
Length (in.) | 188.3 22.7 -0.076 2.000
Turn Circle (ft.) | 39.8 4.4 0.071 2.228
Displacement (cu. in.) | 198.0 93.1 0.581 2.354
Gear Ratio | 3.0 0.5 0.279 2.109
Car type | 0.3 0.5 0.850 1.723
-----------------------------------------------------------------------
You'll notice the variable labels, shown by default. You can override
that too:
. moments, format(%2.1f %2.1f) variablenames
-------------------------------------------------------------
n = 69 | mean SD skewness kurtosis
-------------+-----------------------------------------------
price | 6146.0 2912.4 1.688 5.032
mpg | 21.3 5.9 0.995 3.997
rep78 | 3.4 1.0 -0.057 2.678
headroom | 3.0 0.9 0.197 2.144
trunk | 13.9 4.3 -0.044 2.159
weight | 3032.0 792.9 0.118 2.073
length | 188.3 22.7 -0.076 2.000
turn | 39.8 4.4 0.071 2.228
displacement | 198.0 93.1 0.581 2.354
gear_ratio | 3.0 0.5 0.279 2.109
foreign | 0.3 0.5 0.850 1.723
-------------------------------------------------------------
-moments- is also just smart enough to filter out any string variables
fed to it, rather than choking on them (-tabstat-) or giving a line
of output flagging 0 observations (-summarize-).
There are some other features too, but that's enough on -moments-.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/