Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: lpoly and nonmissing fitted values where the dependent variable is missing
From
Alex Olssen <[email protected]>
To
[email protected]
Subject
Re: st: lpoly and nonmissing fitted values where the dependent variable is missing
Date
Thu, 12 Aug 2010 11:10:23 +1000
Thank you Yulia.
On 11 August 2010 03:55, Yulia Marchenko, StataCorp LP
<[email protected]> wrote:
> Alex Olssen <[email protected]> has a follow up question about -lpoly-:
>
>> Does anybody know how -lpoly- chooses how far to extend fitted values
>> outside the values used for estimation? I have a feeling this should be
>> related to bandwidth but it is not clear how or why.
>>
>> The following code looks like with the rectangle kernel and linear
>> regression -lpoly- estimates to the bwidth - 5 units outside the estimation
>> values. This seems arbitrary though. Is there a good reason?
>>
>> sysuse auto, clear
>> sort length
>> lpoly price length if length<190, ker(rec) deg(1) bwidth(10) nogr gen(L10) at(length)
>> lpoly price length if length<190, ker(rec) deg(1) bwidth(20) nogr gen(L20) at(length)
>> ...
>
> -lpoly- evaluates the smooth at each specified grid point. In Alex's example,
> the grid points are determined by all the values of the -length- variable, as
> specified by the -at()- option. The range of grid values for which -lpoly-
> reports a nonmissing smoothed value is not arbitrary and is determined by how
> many observations are available to perform a (local) regression fit at each
> grid point.
>
> For each grid point, the set of values to be used in a local regression fit is
> determined by the weights which represent the "nearness" of each observation
> to the target grid point: the "further" an observation is from the grid point
> the closer its weight to zero. The weights are determined by both a specified
> bandwidth and a chosen kernel function; Alex can find more details about the
> actual computation in the Methods and Formulas section of the documentation
> entry for -lpoly-, -[R] lpoly- on p. 939. Only the observations with nonzero
> weights are used in a local regression fit. The fit is computed if there are
> at least two observations in a local region; otherwise, a missing value is
> returned.
>
> Returning to Alex's examples, with the rectangular kernel and the bandwidth
> equal to 10, the last grid point for which there are at least two observations
> in a local regression fit (taking into account the specified -if- restriction)
> is 195. In the example with the same kernel and the bandwidth equal to 20,
> the last such grid point is 204. As expected, holding everything else
> constant, increasing the bandwidth increases the range of grid values for
> which the smooth evaluates to a nonmissing value.
>
>
> -- Yulia
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/