Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -distinct- updated on SSC
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
st: -distinct- updated on SSC
Date
Wed, 21 Mar 2012 18:06:11 +0000
Thanks to Kit Baum as usual, -distinct- by Gary Longton and myself has been updated on SSC. -distinct- requires Stata 8 only and the revised version may be installed using -ssc- or -adoupdate-.
-distinct- was also published through the Stata Journal and the corresponding paper at
SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command
is visible -- under Stata Journal's three-year moving window -- to all at
http://www.stata-journal.com/sjpdf.html?articlenum=dm0042
The update to the program will also be made in Stata Journal 12(2), in about three months' time, so the version on SSC is the most recent version that is publicly available.
The update contains filters that stipulate that display is restricted, by either or both of -max()- and -min()- options. For example, -distinct, max(2)- stipulates display of variables be restricted to those with at most 2 distinct values.
. sysuse auto
(1978 Automobile Data)
. distinct
| Observations
| total distinct
--------------+----------------------
make | 74 74
price | 74 74
mpg | 74 21
rep78 | 69 5
headroom | 74 8
trunk | 74 18
weight | 74 64
length | 74 47
turn | 74 18
displacement | 74 31
gear_ratio | 74 36
foreign | 74 2
. distinct, max(2)
| Observations
| total distinct
--------------+----------------------
foreign | 74 2
By the way, missing values are ignored by default, but may optionally be included in calculations.
I wanted these options badly for myself a few weeks ago when trying to get to grips with a large and awkwardly named dataset and wanting to focus quickly on which variables were likely to be group identifiers. The extra options serendipitously are pertinent to recent threads that turn on the identification of binary variables, as was mentioned in earlier discussion -- and indeed a very recent thread on singleton variables.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/