While I can see Nick's point about interpolation in general, I agree
with Ben Jann that -ipolate- should not "replace" nonmissing values of
the y variable when it interpolates using multiple y values per value
of x, since according to its own help file it does not. The missing
value of rep78 for mpg==19 should be filled in, but not the four
nonmissing values of rep78 for mpg==15 or 17.
I would add the following to line 75 of ipolate.ado (currently a blank line):
qui replace `z'=`usery' if `usery'<.
and alter the first two lines to read
*! version 1.0 6sep2005 based on ipolate, version 1.3.3 21sep2004
program define nipolate, byable(onecall) sort
then save the revised program as nipolate.ado, which produces
rep78 mpg rep78i
. 14 .
4 15 4
3 15 3
3 16 3
5 17 5
2 17 2
3 18 3
3 19 3
. 19 3
3 19 3
3 21 3
. 22 3.5
4 23 4
4 25 4
4 25 4
. 26 4.1
. 26 4.1
5 35 5
in Ben's example.
Alternatively, you could add an option to line 5, making it e.g.
*/ [ BY(varlist) Epolate noavg]
and make line 75:
if "`avg'"!="" { qui replace `z'=`usery' if `usery'<. }
so nipolate would behave as does ipolate unless you specify noavg.
On 9/5/05, Nick Cox <[email protected]> wrote:
> -ipolate- interpolates linearly within gaps. That
> is, it is assumed that the y variable varies linearly with
> the x variable within any gaps. This is best seen
> geometrically, as the last (x,y) pair before any gap
> and the first (x,y) pair after any gap are just joined
> by a straight line and intermediate results read
> off directly.
>
> In addition, a consequence of the assumption that
> y is piecewise linear in x is
> that repeated y's at any x are just averaged.
>
> Plotting your results will make it easier to
> see what is going on.
>
> The help file is somewhat elliptical here.
>
> Nick
> [email protected]
>
> > Ben Jann wrote:
> >
> > Stata's -ipolate- command produces results I don't
> > understand. Here is an example:
> >
> > . set seed 2346
> > . sysuse auto
> > . drop if rep78<. & uniform()<.8
> > . ipolate rep78 mpg, g(rep78i)
> > . sort mpg
> > . list rep78 mpg rep78i, clean
> >
> > rep78 mpg rep78i
> > 1. . 14 .
> > 2. 4 15 3.5
> > 3. 3 15 3.5
> > 4. 3 16 3
> > 5. 2 17 3.5
> > 6. 5 17 3.5
> > 7. 3 18 3
> > 8. 3 19 3
> > 9. 3 19 3
> > 10. . 19 3
> > 11. 3 21 3
> > 12. . 22 3.5
> > 13. 4 23 4
> > 14. 4 25 4
> > 15. 4 25 4
> > 16. . 26 4.1
> > 17. . 26 4.1
> > 18. 5 35 5
> >
> > -help ipolate- states that rep78i should equal rep78
> > if rep78 is not missing ("ipolate creates newvar = yvar,
> > where yvar is not missing"). This is certainly not the case
> > in the above example. For some reason, "3.5" is stored
> > for cases 2, 3, 5, and 6. Can someone explain this to me?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/