I almost totally agree with Steve's advice. He uses the word Winsorize a
little more widely than is standard. (By the way, I can assure anyone
who reads that FAQ that the misbegotten word "gotten" did not appear in
my original draft.)
I'd favour making the omission of outliers a little more evident. In
this and some other respects -stripplot, box- is more flexible than
-graph box- or -graph hbox-. -stripplot- is downloadable from SSC.
Consider as an example -price- in the auto dataset.
sysuse auto
clonevar price2 = price
replace price2 = 14000 if price2 > 14000
stripplot price2, over(foreign) box center stack width(250) ///
xla(4000(2000)12000 14000 "outliers")
gen outliers = price > 14000
stripplot price2, over(foreign) box center stack width(250)
xla(4000(2000)12000 14000 "outliers") ///
separate(outliers) ms(oh S) legend(off)
Nick
[email protected]
[email protected]
Try the -nooutside- option or switch to another scale and show
everything. See: Nick Cox's FAQ at
http://www.stata.com/support/faqs/graphics/boxandlog.html . What he
demonstrates can apply to scales other than the log.
If you want to show some of the outside points, but not all, you will
have to Winsorize the points you want to hide. Replace them with a
value at the upper end of your desired graph range and give them an
invisible marker symbol. This will leave the rest of the boxplot
unchanged. You can add text at that value to show the number of
higher points excluded.
This problem comes up for other commands in which Stata computes the
plotting points; -stcurve- is an example. Stata has a -range- option
for axes, but it can only expand, not contract, the plotting range.
On Thu, Jul 16, 2009 at 3:09 AM, Dana Chandler<[email protected]>
wrote:
> I am preparing some graphs with simple boxplots over various groups.
> Thus on my x-axis, I have categorical variables for population groups.
> My y-axis has # of businesses of a certain type within each population
> group.
>
> Unfortunately, I would like to be able to only show the y-axis within
> a certain range (so as to not have outliers distort the picture). One
> idea I had was to simply do the graph and add "IF #businesses < 50".
> This will make the graph visible, but will distort the IQR of the
> boxplot. The "yscale(r(0 25))" command does not seem to work and seems
> only to "extend" a range of y-values rather than restrict it. Does
> anyone have a suggestion for how to construct a graph for the entire
> range of data but only display it over a specific range?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/