Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Histograms (was: Multiple (overlaid) Histogram)


From   Marcello Pagano <[email protected]>
To   [email protected]
Subject   Re: st: RE: Histograms (was: Multiple (overlaid) Histogram)
Date   Thu, 29 May 2003 18:45:54 -0400

As I have said before, I would very much like, with Allan Reese, an option for equi-probability histograms. I think that this is especially useful when thinking of the histogram as an estimator of an underlying density function of a continuous variable. I do not see a strong argument, other than tradition (laziness?), for being constrained to histograms with equi-spaced bins.

m.p.



Nick Cox wrote:

R. Allan Reese


On Thu, 29 May 2003, Nick Cox wrote:

... Empirical. You will see histograms with unequal widths
particularly in older books and papers, and the reason was
that data for them came already grouped in such classes. There's
an example in Snedecor and Cochran's venerable text.
That seems far less common today when more and more data sets are
available in raw, ungrouped form, modulo confidentiality
constraints. I don't see people asking for this often on
Statalist,

and one good reason for this being low down in priority is that
it is practice rarely needed.
The linked issue is whether it is strictly true, as Nick previously
commented, that "adjacent bars touch. (If this isn't true,
you haven't got
a proper histogram.)"  In a histogram, it is unit area that
represents the
weight of data.  Hence a class interval that is widened should be
proportionately reduced in height.

I suggest this is a "design decision" which has implications for the
message conveyed by a graph.  Consider a data series such as
4,4,4,5,5,5,5,5,6,6,6,6,16 and use for convenience unit-width bins.
Stata's histogram command shows the 16 as a single observation and,
implicitly, as an outlier.  If you don't allow zero-height
bars but demand
that adjacent bars touch, the upper bin might run from 7 to
16 with a
height of 0.11 and the data now look like a skewed
distribution with a
long upper tail.  Neither version is more "correct"
absolutely, though one
may be more appropriate to an interpretation of the data.

Hence, I would support adding the option in the software,
eg a new option
"classes(4,5,6,7,16)" or "width(1,1,1,9)", to allow
irregular spacing.
The user then has control of the design choice, rather than being
compelled by the software (writer).

Excel has particularly abhorrent approaches to the choice
and labelling of
bins for histograms.

As for the definition, I am more than happy with the idea that some
bars are of zero height and touch other bars which may be of zero
or non-zero height. That is, my definition emphatically does not
rule out gaps between bars: they are just populated by bars of
zero height. (Or, if you like, the principle that adjacent bars
touch does not rule out the possibility of bars not being adjacent.)

As for an extra option, it is easy to specify this as a desired item,
but my guess is that implementing this on top of the
existing -histogram-
command would be far more of a labour than the real benefits
imply. That's Stata Corp's problem, but it could be enough
to push this a long way down the list of priorities.

In addition, I doubt that all users have as much graphical sense
as Allan. This option could be a gateway to lots of rather silly
histograms, and although one shouldn't rule out syntax on the
grounds that it might be abused, I feel queasy at the prospect.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

--
______________________________________________________________________

Marcello Pagano
Biostatistics Department			Tel: 1-617-432-4911
Harvard School of Public Health		        Fax: 1-617-739-1781
655 Huntington Avenue            		email:[email protected]
Boston, MA  02115                 		http://biosun1.harvard.edu/~bio200
USA

eppur si muove


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index