Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: RE: statalist-digest V4 #4588 - was Graphing |
Date | Tue, 24 Jul 2012 13:38:14 +0100 |
Nick On 24 Jul 2012, at 13:22, David Hoaglin <dchoaglin@gmail.com> wrote:
Allan, I resisted the temptation to mention a trellis arrangement, in part because the bars to be compared would not have a common baseline. Stacked bars usually fail because the bars are stacked. I was referring to empirical evidence, rather than authority. Bill Cleveland and colleagues conducted studies of people's ability to make accurate comparisons in various graphical-perception tasks. They arrived at the following ordering, from most accurate to least accurate (Cleveland 1985, Table 4.3): 1. Position along a common scale 2. Position along identical, nonaligned scales 3. Length 4. Angle --- Slope 5. Area 6. Volume 7. Color hue --- Color saturation --- Density . In a stacked bar chart, the comparisons are among lengths (unless some variable is encoded in them, the widths carry no information). Why not move up in the hierarchy when one can? I agree that it is important to distinguish data that are ranks from data that are grades. Also, as often, some analysis is likely to be helpful in understanding the patterns in the data and deciding what to present. David Hoaglin William S. Cleveland. The Elements of Graphing Data. Wadsworth Advanced Books and Software, 1985. W. S. Cleveland and R. McGill. Graphical perception: theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association 1984; 79:531-554. W. S. Cleveland and R. McGill. Graphical perception and graphical methods for analyzing scientific data. Science 1985; 229:828-833. On Tue, Jul 24, 2012 at 6:22 AM, Allan Reese (Cefas) <allan.reese@cefas.co.uk> wrote:The answers offered to Aminu may be helpful but they appear to ignore the contradiction in the question as posed: [Aminu] I have a qualitative data where 8 diseases ranked (1 (most-important) to 5 (least important)) based on perception - 37 subjects were interviewed so 37 records in the dataset. It appears the diseases were not *ranked* but *graded*, an importantdistinction. If ranked, only one disease can be first for each person,and the ranks would run 1-8. If graded, could one person think all eight had the same importance? Before offering code to draw specificgraphs, it is necessary to know the intended use of the graph (analysis or presentation); if the latter, what message is it intended to convey? For example, a slide to make the point "everyone thinks this disease isimportant, but this one is considered trivial" might well use eight stacked bars and vivid colours.[David Hoaglin ] "A key message is that stacked bars are generally a bad idea. You would do better with a little histogram for each disease (5bars, each sitting on the horizontal axis) and no numbers on top."This approach has the advantage that the 8 bars for each rank, thoughnot adjacent, have a common baseline." [AR] 8 histograms side by side may be less easy to compare than 8 stacked vertically in a trellis. This is like comparing "profiles" incorrespondence analysis. Rather than take authoritarian advice, Aminu might try both and see which conveys the intended message. This may beculturally dependent (relating to direction of reading) and I think Aminu in is Nigeria.[Nick Cox] My bottom line is that stacking is often chosen, but rarely optimal. One disadvantage of stacking is that it imposes dependence on akey or legend.[AR] I'll agree with Nick that random choices from a "graphics gallery"are often confused and confusing. Stacked bars often fail because theauthor didn't think about the order of stacking - Excel stacks values in alphabetic order of labels. Stata makes it easy to choose the order and here the values are inherently ordered, so reference to a legend is less problematic. Since the values for each reply are discrete (no of 1s, noof 2s etc), another option is a line chart for each disease, so the slope between values helps the visual comparison of "profiles". I'll guess the data were input as 37 cases with a variable for the importance of each disease. If -reshaped- as 37*8 cases withsubject/disease/importance as variables, -tab disease importance- gives the table of counts that the graph is intended to illustrate. Hence onecan do some analysis before drawing a graph to present the findings. Allan* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/