Allan's complaining about perceived perversity, but I am not clear what he
would regard as good behaviour.
I can see a good case for arguing that with -histogram, discrete-, and nothing
else said, the default should have been -frequency-, but yoking options like
that is rarely good software design. Anyway, that wasn't done, and a change
is now more difficult to justify.
As -frequency- is just an option away, this strikes me overall as a
very little deal. I rarely get graphs right first time in any case, and
others may have had similar experiences.
Nick
[email protected]
Jann Ben
> Bang! I don't agree. The purpose of a histogram is to make
> visible the shape of a density. It is therefore natural to
> report the y-axis in terms of a density.
Allan Reese (Cefas)
> > The default "hist x" command in Stata gives a Y axis labelled
> > a density. I've never given it much attention until I saw
> > the scale went up to 2 on a plot. Hold on, density functions
> > sum to 1 over the variable.
> >
> > Further investigation and discussion with Statacorp
> > identified that the default tries to make the "area" of the
> > bars add up to 1. If the number of bars changes, so does
> > their width and so does the Y labelling. In my example, the
> > data were discrete, so increasing the number of intervals did
> > not change the plot except to add more zero-height columns
> > and hence make each column narrower.
> >
> > hist x, bin(n) therefore caused different Y
> > labelling with varying n
> > hist x, xcale(xrange(0 n) did not affect the labelling,
> > though the bars got narrower with bigger n
> > hist x, frac and
> > hist x, discrete both gave correct labelling, and
> > the sum of column heights was 1.
> >
> > Do other users think this is perverse behaviour, especially
> > as the default? My take is that, when drawing a histogram,
> > the column width is taken as an arbitrary unit, not directly
> > related to the x-scale. The implication is that you need to
> > scale the height only when there are mixed-width columns, but
> > would not label the Y axis in "freq/absolute-width" units.
> > Having "densities" that vary and are in such peculiar units
> > (1/locust in my example!) does not seem helpful.
> >
> > Shoot me down
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/