Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: How to plot cdf after corrected kernel density
From
philippe van kerm <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: How to plot cdf after corrected kernel density
Date
Fri, 4 Oct 2013 12:30:46 +0000
Nick is right: -akdensity- does nothing specific to address lower and upper bounds for data on a restricted range.
I guess the short answer Monica may be looking for is to use -integ- instead of -cumul- (but see Nick's point about crude integration methods on smart PDF estimation)
sysuse auto
_kdens mpg, g(b a)
integ b a , gen(cb)
line cb a, sort
Note that if interest is ultimately in the (smoothed) CDF, she requires a much smaller bandwidth than what would be 'optimal' for the PDF estimation.
A transformation of the data may sometimes be a way to deal with boundary issues in kernel density estimation.
Philippe
> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Nick Cox
> Sent: Friday, October 04, 2013 11:07 AM
> To: [email protected]
> Subject: Re: st: How to plot cdf after corrected kernel density
>
> -akdensity- from Philippe Van Kerm (SJ) is an excellent command, but
> I don't see options to respect lower and upper bounds, as Monica's
> problem evidently requires. Philippe will correct me if I am wrong.
>
> However, her post does not dwell on this aspect and she uses an
> accessible example (-mpg- in the auto dataset), for which this problem
> does not bite.
>
> In practice, -akdensity- appears to produce estimates for the density
> for a range wider than the observed data, so that might entail
> projecting beyond the natural support of the data.
>
> The advice here depends a little on what the aim is, which could range
> from just wanting a nicer graph for display (because you don't trust
> the irregularities that are visible) to wanting numerical estimates
> too for some later purpose.
>
> Clearly there is no such thing as "the" smoothed cdf, as it is easy to
> think of several ways to get a cdf, either directly or indirectly.
>
> Also, for most purposes it would be expected that you might have to
> explain how you got a smoothed cdf. In principle, naturally, the cdf
> is just the integral of the pdf, but any method that is smart about
> calculating the pdf but crude about integrating it may not be optimal.
>
> I am fond of kernel density methods and often use them, but their
> emergence as a default or standard method seems a little accidental.
> As they are essentially local methods, they don't place a high premium
> (or indeed any at all) on global smoothness. For visualization they
> can be a little conservative which is usually an excellent thing, as
> researchers should always be on the lookout for quirky details of
> their distributions.
>
> Other methods (including logspline density estimation) work well, but
> on a quick search I can't find a Stata implementation.
>
> All that said, I still prefer estimating quantiles; it's really the
> same problem, as graphically you are just exchanging axes.
>
> Nick
> [email protected]
>
>
> On 3 October 2013 23:20, Alfonso S <[email protected]> wrote:
>
> > I suggest you download the package akdensity (st0037_3). It does an
> adaptive kernel density and generates the cdf variable as well. Use the
> code below to check it out.
> >
> > sysuse auto
> > akdensity mpg, g(a b) cdf(cb)
> > line cb a
> >
> > Let me know if that is what you were looking for.
>
> From: Nick Cox <[email protected]>
>
> > The bottom line in the post you cite advises
> >
> > "I prefer to get smoother cumulative distribution functions directly
> from
> > estimated quantiles."
> >
> > I agree with that.
>
> On 3 October 2013 21:45, Jain, Monica (HarvestPlus) <[email protected]>
> wrote:
>
> >> I am using -kdens- and I do not know how to plot the cumulative
> distribution function. I am using Stata 13 for Windows.
> >>
> >> I am using -kdens- to estimate kernel density correcting for bounded
> variables using linear combination method. I want to plot the
> cumulative distribution function for the estimated kernel densities. On
> one of the statlist threads
> (http://www.stata.com/statalist/archive/2005-04/msg00798.html), the
> following method has been suggested to plot them:
> >>
> >> sysuse auto
> >> _kdens mpg, g(b a)
> >> cumul b, g(cb)
> >> line cb b, sort
> >>
> >> With the above command, I get the densities on the x-axis, rather
> than the [x]. I looked all over the web to check if I can find how to
> do it, but I have not been successful. If I use the following command:
> >>
> >> line cb a, sort
> >>
> >> I get weird triangle shaped graph.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/