From | Marcello Pagano <[email protected]> |
To | [email protected] |
Subject | Re: st: RE: Histograms (was: Multiple (overlaid) Histogram) |
Date | Thu, 29 May 2003 18:45:54 -0400 |
R. Allan ReeseOn Thu, 29 May 2003, Nick Cox wrote:... Empirical. You will see histograms with unequal widths particularly in older books and papers, and the reason was that data for them came already grouped in such classes. There's an example in Snedecor and Cochran's venerable text. That seems far less common today when more and more data sets are available in raw, ungrouped form, modulo confidentiality constraints. I don't see people asking for this often onStatalist,and one good reason for this being low down in priority is that it is practice rarely needed.The linked issue is whether it is strictly true, as Nick previously commented, that "adjacent bars touch. (If this isn't true, you haven't got a proper histogram.)" In a histogram, it is unit area that represents the weight of data. Hence a class interval that is widened should be proportionately reduced in height. I suggest this is a "design decision" which has implications for the message conveyed by a graph. Consider a data series such as 4,4,4,5,5,5,5,5,6,6,6,6,16 and use for convenience unit-width bins. Stata's histogram command shows the 16 as a single observation and, implicitly, as an outlier. If you don't allow zero-height bars but demand that adjacent bars touch, the upper bin might run from 7 to 16 with a height of 0.11 and the data now look like a skewed distribution with a long upper tail. Neither version is more "correct" absolutely, though one may be more appropriate to an interpretation of the data. Hence, I would support adding the option in the software, eg a new option "classes(4,5,6,7,16)" or "width(1,1,1,9)", to allow irregular spacing. The user then has control of the design choice, rather than being compelled by the software (writer). Excel has particularly abhorrent approaches to the choice and labelling of bins for histograms.As for the definition, I am more than happy with the idea that some bars are of zero height and touch other bars which may be of zero or non-zero height. That is, my definition emphatically does not rule out gaps between bars: they are just populated by bars of zero height. (Or, if you like, the principle that adjacent bars touch does not rule out the possibility of bars not being adjacent.) As for an extra option, it is easy to specify this as a desired item, but my guess is that implementing this on top of the existing -histogram- command would be far more of a labour than the real benefits imply. That's Stata Corp's problem, but it could be enough to push this a long way down the list of priorities. In addition, I doubt that all users have as much graphical sense as Allan. This option could be a gateway to lots of rather silly histograms, and although one shouldn't rule out syntax on the grounds that it might be abused, I feel queasy at the prospect. Nick [email protected] * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
-- ______________________________________________________________________ Marcello Pagano Biostatistics Department Tel: 1-617-432-4911 Harvard School of Public Health Fax: 1-617-739-1781 655 Huntington Avenue email:[email protected] Boston, MA 02115 http://biosun1.harvard.edu/~bio200 USA eppur si muove * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |