Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Nick Cox" <n.j.cox@durham.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: kdensity with few (/aggregated) data points |
Date | Wed, 30 Jun 2010 17:29:14 +0100 |
I wouldn't read anything of statistical substance into the differences. It looks as if -kdensity- and -twoway kdensity- have different graphical defaults for drawing the estimated density, one using connected lines and the other something smoother, in essence a cubic spline. I agree with your implied puzzlement: it's not obvious why that should be so, but the difference is in any case a matter of presentation. It's a real stretch to get a decent density function estimate out of any sample of the order of 10 observations, and no statistical magic (white or otherwise) can help much there. I think there is a marginal advantage to using -kdensity- directly and ignoring a histogram. Binning of about 10 points can hardly be anything but capricious and when you have that few there is no reason not to show all the raw data in addition to any density estimate. Nick n.j.cox@durham.ac.uk Amy I just thought to re-phrase my question. I've noticed that if I have very few data points (e.g. 10) then kdensity gives me something jagged even if I specify a Gaussian kernel (regardless of the bandwidth). If the reason I have so few data points is because I have aggregate data, e.g. data for each decile of a population, is there any way to make this smoother? Why is it that histogram X, bin(10) kdensity kdenopts(gauss) will give me something that looks smoother? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/