Jason,
I think that Nick is right (as always). For your particular case
try the following:
bysort Country: ipolate Education Year, g(Education_ipo)
gen Education_5=Education
replace Education = Education_ipo
drop Education_ipo
Rodrigo.
----- Original Message -----
From: "Jason Yackee" <[email protected]>
To: <[email protected]>
Sent: Saturday, September 09, 2006 1:45 PM
Subject: st: RE: RE: filling in missing panel data as a trend line
Nick,
Thank you for the suggestion. I don't think -ipolate- quite works for
what I have in mind, but maybe I am wrong. Here is a hypothetical
picture of the data. "Education" is simply the average total years of
education of a country's population.
Country Year Education
Mex. 1970 3.4
Mex. 1971 .
Mex. 1972 .
Mex. 1973 .
Mex. 1974 .
Mex. 1975 4.2
Mex. 1976 .
Mex. 1977 .
Mex. 1978 .
Mex. 1979 .
Mex. 1980 4.7
Nic. 1970 1.5
Nic. 1971 .
Nic. 1972 .
~~~ ~~~ ~~~
Nic. 1980 3.2
Perhaps a better way of describing what I want to do is to fill in the
years between survey dates with a sort of moving average, so that the
differences between the measured years are evenly split between the
(in-between) missing years. So for Mexico, the difference between
measured year 1975 and measured year 1970 is 4.2 - 3.4 = 0.8. To
linearly fill in the missing values, I would make 1971 = [3.4 +
(0.8*1)/5], 1972 = [3.4 + (0.8*2)/5], and so on.
I could obviously do this by hand, but for 140 countries and 30 years
this would take some time. So I take it that I would have to write some
code automate the process? Since I am new to code-writing, any ideas
would be very much appreciated.
Jason Yackee
Stata 9.2 Intercooled
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Saturday, September 09, 2006 3:57 AM
To: [email protected]
Subject: st: RE: filling in missing panel data as a trend line
This sounds like linear interpolation: see -ipolate-.
Panel data should be interpolated separately -by <panelid>:-.
But note that if "education" means something like "years
of education", then that case too is discussed in the
FAQ you cite in its last section, at least for people
who stay in the system and progress a year at a time.
People who repeat a year or take years out of the system
are naturally a complication.
Nick
[email protected]
Jason Yackee, PhD Candidate; J.D.
> For my panel data set I have a variable ("education") that
> has only been
> collected every five years. My data set is otherwise annual; I would
> like to fill in the missing data for "education" on the basis of a
> regression/trend line between each five-year observation, rather than
> using the "cascade" method detailed in this faq:
> http://www.stata.com/support/faqs/data/missing.html. I don't
> see a way
> to do what I want to do using -impute-. Would someone be able to
> suggest an appropriate approach?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/