Thank you Nick. I knew of this FAQ, but wasn't sure how to difference
within some identifier to fill the missing cells. This looks like
what I need.
On 11/1/07, n j cox <[email protected]> wrote:
> This problem is a twist away from one discussed in an FAQ:
>
> How can I replace missing values with previous or following nonmissing
> values or within sequences?
> http://www.stata.com/support/faqs/data/missing.html
>
> Scott only needs to fill in gaps of missings with the previous value,
> then all is plain sailing.
>
> gen amo2 = amo
> bysort id (year) : replace amo2 = amo2[_n-1] if missing(amo)
>
> Then
>
> by id : gen dt2 = cond(amo == ., ., d.amo2)
>
> I would make no claims about efficiency except that this should beat
>
> 1. any loop
> 2. fixing by hand
>
> This should also fix gaps longer than one year.
>
> Nick
> [email protected]
>
> Scott Cunningham
> --------------------------------------------------------------------------------
>
> My data is a longitudinal dataset of individuals who were interviewed
> from 1997 to 2004. I have data on individual ages (measured as months
> from birth month). Because this interview did not always,
> consistently, ask individuals exactly 12 months after the last
> interview, I have been trying to control for differences in time since
> the last interview by differencing their ages as so:
>
> . gen dt=d.amo
>
> where "amo" is "age in months." I notice that this works so long as I
> have values of amo in both the current and previous year. But there
> are some people who disappear from the survey only to return a year
> later. They look like this:
>
> +----------------------------------+
> | id rp age amo dt year |
> |----------------------------------|
> 56. | 27 0 15 189 12 1997 |
> 57. | 27 . . . . 1998 |
> 58. | 27 4 18 226 . 1999 |
> 59. | 27 3 19 237 11 2000 |
> 60. | 27 8 20 247 10 2001 |
> 61. | 27 6 21 259 12 2002 |
> 62. | 27 4 22 273 14 2003 |
> |----------------------------------|
> 63. | 27 1 23 283 10 2004 |
>
> The relevant variables are: id (indiciating this is the same person),
> amo (age in months on day of interview), dt (time since last
> interview), and year. Ignore the "rp" variable, but note that this
> variable measures something which depends on "dt" since it is a
> measure of something done since the date of the last interview.
>
> So, the problem is "dt" is missing twice. Once when all values are
> missing because the person was not interviewed. A second time when he
> comes back in. Ideally, I would like to know how to create
> differenced values for dt equal to (226-189), since the respondent is
> 226 months old on the day of the interview and was 189 the last time
> interviewed. What's the most efficient code to do this?
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
"A man must be orthodox on most things, or he will never have time
able to practice his own particular heresy." - GK Chesterton
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/