Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: calculating the whiskers on a boxplot using -twoway-
From
"Sheena Sullivan" <[email protected]>
To
<[email protected]>
Subject
st: calculating the whiskers on a boxplot using -twoway-
Date
Thu, 21 Mar 2013 10:40:21 +1100
I am creating boxplots using -twoway- rather than -graph boxplot- because I
want to display the mean values and it was suggested in a previous Statalist
post to follow the (great) guidance provided in N.J. Cox "Speaking Stata:
Creating and varying box plots." SJ 9(3):478-496.
However, I'm a little confused about the guidance for calculating the
whiskers. The Cox article suggests that the length of the whiskers can be
generated using the following (syntax for upper whisker only is shown):
. bysort year: egen upper=max(min(logIC50, upq+1.5*iqr)) ///(eq1)
logIC50 is my variable of interest, upq is its upper quartile, loq is its
lower quartile and iqr is its inter-quartile range.
The upper limit, upper, should be the largest value (of an observation
within the dataset) not greater than upq + 1.5*iqr
and the lower limit, lower, should be the smallest value not less than loq -
1.5*iqr
But they aren't.
I noticed the problem because if I draw my plots using the -twoway- commands
in the Cox article (see end of email) the whiskers are longer than when
drawn using -graph box-. So I did a manual inspection of the data and
sometimes the values generated by eq1 represent a value in the dataset and
at other times they do not.
If I instead use the following syntax:
. gen upper1=upq+1.5*iqr
///(eq2)
. bysort year: egen upper2=max(logIC50) if logIC50<upper1 ///(eq3)
I can generate a graph using the -twoway- commands in the Cox article with
whiskers that look the same as when I draw it using -graph box-
However, to get my outliers to appear, I need to include a -scatter- and
using the upper2 and lower2 variables doesn't work because the if statement
in eq4 means no value is generated for lower2 and upper2 for those
observations outside the range. A work around is to fill the gaps by
creating another variable of the mean (or min or max because they are all
the same) of the other observations within that by group:
. bysort year: egen upper3=mean(upper2)
. bysort year: egen lower3=mean(lower2)
But this getting to be an awful lot of syntax just to add a mean to a
boxplot!
So, is there a better way to generate the whiskers? And can anyone help me
understand why eq1 sometimes produces results that do not correspond to a
value in the dataset?
Thanks,
Sheena
Syntax for drawing the boxplot using -twoway-
. twoway rbar med upq year, pstyle(p1) blc(gs4) bfc(gs9) blw(vthin)
barw(0.35) || /// upper half of box
> rbar med loq year, pstyle(p1) blc(gs4) bfc(gs9) blw(vthin)
barw(0.35) || /// lower half of box
> rspike upq upper year, pstyle(p1) || /// upper whisker
> rspike loq lower year, pstyle(p1) || /// lower whisker
> rcap upper upper year, pstyle(p1) msize(*2) || /// cap the
upper whisker
> rcap lower lower year, pstyle(p1) msize(*2) || /// cap the
lower whisker
> scatter mean year, ms(+) msize(1) mlw(vvthin) || /// mean
> scatter logIC50 year if !inrange(logIC50, lower, upper),
ms(O) mc(gs4) msize(1) mlw(vthin) /// outliers
> legend(off) ///
> yla(, ang(h)) ytitle(Log IC50) xtitle("") ///
> yscale(range(-2 2.5)) ylabel(-2 -1 0 1 2) title(B)
b1title(Year) ///
> name(p_B, replace) yline(.53 1.53, lcolor(black)
lpattern(dash)) ///
> scheme(sj)
Syntax for -graph box-
. graph box logIC50, over(year) ///
> name(pB, replace) yline(.53 1.53, lcolor(black) lpattern(dash))
///
> yscale(range(-2 2.5)) ylabel(-2 -1 0 1 2) title(B) b1title(Year)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/