Thanks to Kit Baum, a new package -catplot-
has been posted on SSC. This is for plots
of categorical data in Stata 8, specifically
for bar or dot charts of the same showing
frequencies, or fractions, or percents.
(For Stata 7 or earlier there are other
user-written programs available in the same
territory, such as -fbar-, -tabhbar-, -vbar-.)
Those who have looked at Stata 8's new
graphics may well ask: Surely all that is very
well done in Stata 8, with -graph bar-, -graph
hbar- and -graph dot- offering a great range of
possibilities?
The answer is "Yes indeed", and that is
what I am building on, the aim being to add
a convenience command in one particular
area.
I work a lot with students and others who want bar
charts of categorical data, for example, of counts
of categories from one-way, two-way or even three-way
tables from questionnaires and other survey data.
In addition, many of these users want to tell me
for some reason that it's very easy in Excel, so
I really want to be able to say to them that it's
also very easy in Stata.
How does Stata size up on this task?
1. -histogram- is optimised for histograms,
naturally. It can be used for this purpose by
invoking options like
, discrete xla(, valuelabel ang(45)) gap(50)
for a one-way table or
, discrete xla(, valuelabel ang(45)) gap(50)
by(myvar, rows(1))
for a two-way table. Typing this -- or issuing
the equivalent through a dialog -- is a
little more complicated than some Stata beginners
might expect for this task. In any case,
some problems then frequently arise:
a. it doesn't take much for value labels to become
unreadable or to require what I call giraffe graphics,
in which the graphic necessitates a great deal of neck
movement. (That's why I have "ang(45)" in the examples
above.)
b. The number of cells you can show easily and effectively
appears to be ~20, given that you will want value
labels shown to indicate the categories. Any long
value labels make this problem worse.
c. Representing a 3-way table seems impossible, except by
producing and then combining separate histograms.
2. -graph hbar- etc. is good _if_ the
frequencies come predefined as a variable, because
then you can just sum the frequencies. But
if you want Stata to do the counting for you,
this seems to require you to set up something
to count. In particular,
. graph hbar (count) rep78
doesn't give you the frequencies of the
categories of -rep78-. Roughly, we want -graph-
here to -contract-, not -collapse-.
The way to do it is to calculate something in
advance, as in
. gen freq = 1
. graph hbar (count) freq, over(rep78)
but arguably we shouldn't have to do that.
And as for percents, catching missings,
and working with -if- and -in-: it
really needs a program.
So that's the rationale for -catplot-. What it
actually does can be seen by reading the help
. ssc type catplot.hlp
and then if interested you can install
. ssc inst catplot
Nick
[email protected]
P.S. choosing good names is not always
easy. Perhaps this one is down partly
to the fact that I like cats.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/