Thanks to Kit Baum, a new package -linkplot- has been
added to SSC. This requires Stata 8.
To install, type
. ssc inst linkplot
in an up-to-date net-aware version of Stata 8.
-linkplot- draws linked (i.e. connected) scatter plots.
How does this differ from what is already available through
-connect()- options? Nothing in principle, but a bit more than
that in practice.
Let's dive straight into an example. Box, Hunter and Hunter
(1978, p.100) gave data for 10 boys on the wear of shoes
made using materials A and B. The data are also analysed by
Wild and Seber (2000, p.446). The units are not specified.
One natural data structure would be something like this:
A B id
13.2 14.0 1
8.2 8.8 2
10.9 11.2 3
14.3 14.2 4
10.7 11.8 5
6.6 6.4 6
9.5 9.8 7
10.8 11.3 8
8.8 9.3 9
13.3 13.6 10
Broadly speaking, variations within boys (same boy,
different shoes) are less than variations between
boys, but of more interest. (The design assigns materials
randomly to left and right feet, to avoid "left shoeness"
or "right shoeness", etc.) Graphically, therefore, we
need ways of showing the data that let us appreciate
the fine structure. Some possibilities are provided
by -pairplot- on SSC, and -linkplot- provides others.
This data structure permits some Stata graphs,
but inhibits others. A scatter plot such as
. scatter A B
may be useful, but does not allow easy decoding of the
difference, say A - B, which is here, and elsewhere with
paired data, likely to be of central interest.
Similarly, it is difficult to read off ratios such as
A / B. If A and B are plotted versus id, or vice versa,
the resulting graphs suffer from the arbitrariness of id.
Other possibilities are available after a -reshape-:
. rename A wearA
. rename B wearB
. reshape long wear, string i(id) j(j)
. encode j, gen(material)
id material wear
1. 1 A 13.2
2. 1 B 14
3. 2 A 8.2
4. 2 B 8.8
5. 3 A 10.9
6. 3 B 11.2
7. 4 A 14.3
8. 4 B 14.2
9. 5 A 10.7
10. 5 B 11.8
11. 6 A 6.6
12. 6 B 6.4
13. 7 A 9.5
14. 7 B 9.8
15. 8 A 10.8
16. 8 B 11.3
17. 9 A 8.8
18. 9 B 9.3
19. 10 A 13.3
20. 10 B 13.6
Now we can plot -wear- and -material- on
different axes. (-material- was produced
by -encode-, so is numeric underneath
its value labels.)
But with this data structure, any
connections will typically not be all
vertical or all horizontal. As it happens,
you can use -connect()- for virtually any kind of
connection, so long as
the data have been put
in the right sort order, and (for
some problems) missing values have
been inserted, which you do not want
to connect over,
but that's a fairly awkward "so long as", which is
why -linkplot- codifies the nitty-gritty.
Some possibilities are
. linkplot material wear, link(id) yla(1 2, valuelabel)
ysc(r(0.5 2.5)) yla(, ang(h))
. linkplot wear material, link(id) xla(1 2, valuelabel)
xsc(r(0.5 2.5)) yla(, ang(h))
The general idea is that you need to specify a -link()- variable
defining groups to be linked. Usually this will be
some sort of identifier variable, so the idea has
panel data applications.
Some of the tricks for getting data in the right sort
order are discussed in rather dusty old FAQs at
http://www.stata.com/support/faqs/graphics/connect.html
http://www.stata.com/support/faqs/graphics/vplplot.html
although Stata 8 adds a nicer way to do it all,
through -cmissing()-, which is in fact the main
trick within -linkplot-.
More technicalities are covered in the help file.
Vince Wiggins provided encouraging noises
as I worked my way towards this.
Box, G.E.P., W.G. Hunter and J.S. Hunter, 1978.
Statistics for experimenters: an
introduction to design, data analysis,
and model building. New York: John Wiley.
Wild, C.J. and G.A.F. Seber. 2000.
Chance encounters: a first course in data
analysis and inference. New York: John Wiley.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/