John Plummer
> I am using boxplots (graph box x1 x2 x3..., Stata version
> 8.1, Win 98) to
> plot several variables. Some plots seem not to show all the
> data. For
> example, for the following variable x:
>
> . tab x
>
> x | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 125 78.13 78.13
> 2 | 20 12.50 90.63
> 3 | 15 9.38 100.00
> ------------+-----------------------------------
> Total | 160 100.00
>
> the command "graph box x" shows only the median line at 1,
> with no whiskers
> or outside values to indicate the data points at 2 or 3.
>
> Can anyone suggest how I might get boxplots showing the
> full range of the data?
With these data, the upper and lower quartiles
are both 1, and so the so-called step, i.e.
1.5 * (upper quartile - lower quartile) is
0. So values for 2 and 3 lie beyond upper quartile
+ step, and should be plotted individually, as you
imply.
I have two reactions:
1. This looks like a bug. Somehow the iqr of 0 is
getting trapped or ignored, either directly or
indirectly. For example, if I type in Stata
. gen x2 = x1 + smidgen
. gra box x2
then everything looks OK. (I had
to be less poetic: instead of -smidgen-, I
find e.g. 1e-6 * uniform() to work all
right.) You are also going to want the
descriptive information for -x1- to show
up on the graph: there are various ways
to do this, of which a prior -copydesc x1 x2-
is one. (-copydesc- is on SSC.)
2. No box plot is going to be more than
a line and two points with these data. I guess
they are just data concocted to show the problem,
but if your real data are anything like this,
a discrete histogram or dotplot should do a better
job of showing the data.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/