Moon Joong
> >
> > I'm trying to find a smart way to fill missing values
> with averages
> between
> > nearest values. The data looks like the following:
> >
> > id year rates
> > 1 1960 .35
> > 1 1961 missing
> > 1 1962 .45
> > 1 1963 .47
> > 1 1964 missing
> > 1 1965 .46
> > 2 1945 .2
> > 2 1946 missing
> > 2 1947 .35
> > 2 1948 missing
> > 3 1949 missing
> > 3 1950 .56
> > .....
> >
> > What I want to do is, take the example of id 1 of year
> 1961 (which is
> > missing), to fill the average value between .35 and .45 (that is
> > (.35+.45)/2) into the missinge observation of year 1961.
> Since I have a
> lot
> > of observations to fill in, using brute force by hands
> takes forever and
> > would be inaccurate. I think it's very straightfoward,
> which I don't know
> > since I'm a beginner.
In addition to other suggestions, you could use -ipolate-.
However, you should only do this separately for each -id-.
Nevertheless you can cycle over the -id-s with -foreach-.
The following is a sketch. Also check out the -epolate-
option.
levels id, local(levels)
gen rates2 = rates
qui foreach l of local levels {
ipolate rates year if id == `l', gen(work)
replace rates2 = work if missing(rates2) & !missing(work)
drop work
}
This will fill gaps of more than one.
That's linear interpolation. Other interpolation
methods might also be tried. But linear is
right for your examples.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/