Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Get fitted values after locpoly (follow-up)
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Get fitted values after locpoly (follow-up)
Date
Wed, 21 Sep 2011 11:40:04 -0400
Partho Sarkar <[email protected]> :
N.B. -lpoly- does produce an optimal bandwidth, defined in its
documentation, whereas -locpoly- (findit locpoly) uses an "entirely
inappropriate" default bandwidth, as it makes clear in its help file:
If width() is not specified,
then the "default" width is used; see [R] kdensity.
This default is entirely inappropriate for local polynomial smoothing.
Roll your own.
Hence my quotes on "optimal" bandwidth below, which I realized needed
explanation.
On Wed, Sep 21, 2011 at 11:34 AM, Austin Nichols
<[email protected]> wrote:
> Partho Sarkar <[email protected]> :
> You can certainly run -locpoly- (findit locpoly) or -lpoly- on a
> sample of randomly selected points, keep the "optimal" bandwidth
> chosen, and then reestimate using that bandwidth on the full sample,
> and predict out of sample as well. But they do not do adaptive
> bandwidths, if that is what you had in mind.
>
> On Wed, Sep 21, 2011 at 11:25 AM, Partho Sarkar
> <[email protected]> wrote:
>> Tania
>>
>> I think I see where you are coming from, and so just a quick pointer:
>>
>> You are probably thinking in terms of "kernel regression" (or local
>> polynomial regression) as usually understood in the machine learning
>> literature, in which the bandwidth is *optimally* selected (or
>> "tuned") from an available "training set" or "memory set" of (xi,yi)
>> points, and *this bandwidth, together with the training set data*, can
>> then be used to "predict" the y0 value at some previously "query"
>> point x0 outside the training set. [In a sense, you could say that
>> the training set together with the bandwidht constitute the "model"].
>>
>> But this is clearly not how locpoly is set up. The bandwidth is
>> fixed-either by default or your choice. And I am not sure, having
>> only tried a canned example with the program once very briefly, if
>> there is any scope to meaningfully partition the data into training
>> and query sets, as I think you might have in mind. The user interface
>> certainly does not *explicitly* give the user such a choice. [But this
>> can be clarified by those more familiar with this command.] There may
>> be possibly be a roundabout way to get an approximation to what I
>> think you have in mind. But if I wanted to do the kind of kernel
>> regression I mention above, I would (without knowing what other Stata
>> programs may be available for this) go to R's CRAN archives. I worked
>> on this a few years ago, so let me know and I could try to dig up
>> some of the sources, or just search CRAN.
>>
>> Hope this helps
>>
>> Partho
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/