[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: interpreting output of kdensity command

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: interpreting output of kdensity command
Date	Thu, 7 Aug 2003 15:25:18 +0100

Kimberley Tran
> 
>  To build kernel density graphs in Stata, I created a 
> Do-file for purpose of 
> generating a variable within which density measures are 
> taken. This variable 
> contained 100 points. From my understanding, the distance 
> between each point 
> is the bandwidth. I ran this Do-file prior to using the 
> kdensity command.
>  In the resulting kernel density graphs, there are points 
> on the y-axis which 
> are greater than 1. How should the y-axis of the resulting 
> kernel density 
> graphs be interpreted? Is it the frequency of the distribution? 

First off, grid mesh is not the same as bandwidth. 

-kdensity- produces a smoothed estimate 
of the probability density function. The 
units of probability density are the reciprocal 
of the units of the variable whose distribution
you are examining. If that variable is measured 
in metres, the units are 1 / m; if in years, the
units are 1 / yr. The density cannot be negative; 
otherwise there is a constraint that 
the area under the probability density function 
should integrate to 1. It is perfectly possible 
for individual ordinates to exceed 1. 

For example, 

. use auto
. gen gpm = 1  / mpg 
. kdensity gpm 

I see a density estimate which averages about 15
for a range of about 0.09 - 0.02 = 0.07. Roughly, 
15 * 0.07 is about 1, and I am confident that 
a closer estimate would be nearer 1. (There is 
usually some small loss in the extreme tails
with default choices.) 

The units of the density are 

	1 / gallons per mile
      OR miles per gallon 

and the units of the variable are by construction 

	gallons per mile 

Area under the curve has no units, as can be 
seen by cancelling down

	miles       gallons
	-----    *  -------
	gallons     miles 

There is a note on this at [R] p.227. 

David Finney wrote a very nice paper on "Dimensions
in statistics" in Applied Statistics 25, 285-289 (1977). 

Nick 
[email protected] 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: interpreting output of kdensity command
  - From: Roger Harbord <[email protected]>

References:
- st: interpreting output of kdensity command
  - From: Kimberley Tran <[email protected]>

Prev by Date: st: interim analysis in epidemiologic studies
Next by Date: st: histogramm and dates
Previous by thread: st: interpreting output of kdensity command
Next by thread: Re: st: RE: interpreting output of kdensity command
Index(es):
- Date
- Thread