Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: composite labels with -graph hbar-, -graph bar-, -graph dot-
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: composite labels with -graph hbar-, -graph bar-, -graph dot-
Date
Thu, 29 Aug 2013 16:41:12 +0100
This post grows out of questions asked by Mike Cross. Mike deserves all
the credit for an interesting problem. I've focused on the essence as I
see it, but if it is twisted beyond recognition, the debit is to be
attributed to me.
For the record, the threads started by Mike begin at
http://www.stata.com/statalist/archive/2013-08/msg01253.html
http://www.stata.com/statalist/archive/2013-08/msg01275.html
but I've tried to make this self-contained.
The essence of the problem is (nice) display of composite axis labels
with -graph hbar- (or -graph bar- or -graph dot-). The problem extends
to include -twoway-, but the solutions do not.
A composite axis label here could arise from a combination of two (or
possibly three) variables. One context is hybrid graph-tables.
To make things concrete, consider the foreign cars from the auto dataset
. sysuse auto, clear
. keep if foreign
For a dataset like this, the identifier (here -make-) is almost always
something we want to show on something like a bar or dot chart. We might
also want to show numeric information on the axes.
One argument here is that way you get the best of both worlds: the
graphical part of the display shows the general pattern and the details
and you can look up the exact value too.
The conservative objection that graphs are graphs and tables are tables,
and ne'er the twain shall meet, is thus being firmly rebutted. For more
on this, if you wish, see Cox, N.J. 2008. Between tables and graphs.
Stata Journal 8: 269-289
http://www.stata-journal.com/article.html?article=gr0034
Maarten Buis pointed to an -axis()- function from -egenmore- (SSC) which
is a helper function to combine variables to create a variable that can
be used as a single graph axis (hence the name). However, that's only a
partial solution as in effect it creates a variable with value labels
such as "Maarten 42" and "Mike 567" or "42 Maarten" and "567 Mike" which
won't line up well in general. (The user-programmer being maligned here
for a partial solution is myself.)
There are better ways. First we look at a simple solution that often
works. Clone the variable you want to show:
. clonevar price2 = price
. graph hbar (asis) price, over(make) over(price2, gap(*0.5)) nofill
Taking that more slowly,
1. -make- is an identifier, with a distinct value for each
observation.
2. Combining two -over()- options instructs -graph- to show all the
cross-combinations of the variables named, but in this case and many
others several cross-combinations do not exist in the data, so the
-nofill- option is crucial to remove the gaps that would be created in
the graph. If you forget the -nofill- option, the graph may not be even
be readable. In this example, there are as many distinct values of
-price- as observations, so -graph- would be trying to show 22 * 22 =
484 bars, whereas only 22 bars are defined by the data.
3. Two -over()- options by default imply thin bars because of two sets
of gaps. That can be tuned to taste. -gap(*0.5)- is one choice.
4. Using -clonevar- rather than -generate- ensures that variable and
value labels in particular are carried over. That may not be needed, but
it does no harm.
5. Note that
. graph hbar (asis) price, over(make) over(price, gap(*0.5)) nofill
doesn't work, as the attempt to get -price- to play two roles is asking
too much, but a clone does the job.
However, this trick is sensitive to whether the values of the
quantitative variable (here -price-) are all distinct, with no ties.
Consider instead -mpg-, for which there are 13 distinct values for the
22 observations of foreign cars in this dataset.
. clonevar mpg2 = mpg
. graph hbar (asis) mpg, over(mpg2) over(make, gap(*0.5) sort(mpg)) nofill
will work, but
. graph hbar (asis) mpg, over(make) over(mpg2, gap(*0.5)) nofill
may not be what you want.
There is, however, a direct solution to ensure that tied values are
shown distinctly. Think first of the sort order you want for your graph.
Here we simply go for sorting on -mpg-:
. sort mpg
The easy first part is that the order of observations is now the order
you want for your graph axis:
. gen axis = _n
The values of -axis- come from the observation numbers and so are integers
1 up. The more challenging second part is that we want to see the values
of -mpg- on the graph, for which the solution is using value labels.
. labmask axis, values(mpg)
shows off a helper command -labmask-, which should be installed from the
Stata Journal archives. (-search labmask, sj- shows that it was
discussed in the 2008 paper mentioned earlier.) The perhaps whimsical
command name is intended to convey that a variable wears a "mask" which
is visible from outside.
. graph bar (asis) mpg, over(axis) over(make, gap(*0.5 sort(mpg)) nofill
. graph bar (asis) mpg, over(make) over(axis, gap(*0.5)) nofill
are both now in reach, and you can choose accordingly.
Here are all the commands gathered together for anyone who wishes to run
them as a miniature tutorial. I have added -name()- options after each
graph call so that the graphs may all be compared. -labmask- (SJ) must
be installed first.
sysuse auto, clear
keep if foreign
clonevar price2 = price
graph hbar (asis) price, over(make) over(price2, gap(*0.5)) nofill name(g1)
graph hbar (asis) price, over(make) over(price, gap(*0.5)) nofill ///
title(this doesn't work as hoped!) name(g2)
clonevar mpg2 = mpg
graph hbar (asis) mpg, over(mpg2) over(make, gap(*0.5) sort(mpg)) ///
nofill name(g3)
graph hbar (asis) mpg, over(make) over(mpg2, gap(*0.5)) nofill name(g4)
sort mpg
gen axis = _n
labmask axis, values(mpg)
graph hbar (asis) mpg, over(axis) over(make, gap(*0.5) sort(mpg)) ///
nofill name(g5)
graph hbar (asis) mpg, over(make) over(axis, gap(*0.5)) nofill ///
name(g6)
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/