On Mon, 2 Dec 2002 04:24:09 -0800 (PST) Enrica Croda
<[email protected]> wrote:
> On Mon, 2 Dec 2002, Stephen P. Jenkins wrote:
>
> > On Sun, 1 Dec 2002 03:07:48 -0800 (PST) Enrica Croda
> > <[email protected]> wrote:
> >
> > <snip>
> >
> > > So, to recap, I now believe my data are grouped duration data...
> > > I understand that in this case I need to organize my data the so-called
> > > "person-period" form.
> > > I would appreciate getting feedback on the following:
> > > My data are already organized by ID and year in "long" panel data
> > > form (iis ID, tis year) with year = 1984, 1985,...1998.
> > > A. Do I need to -expand- the data set?
> > > I am thinking I just need to generate the analysis time
> > > variable, with something like:
> > > (A1) by ID: generate TIME = _n;
> > > please see also question B, below.
> > > B. How do I deal with delayed entry?
> > > Assuming people first become at risk of not living independently at age 65,
> > > which may not be the age at which they are first observed in my data,
> > > how do I incorporate this information in my analysis?
> >
>
> > Suppose first that there is no delayed entry -- in which case you would
> > need a row in the data set corresponding to each year that each person
> > was /at risk of experiencing the event of interest/. If you were to
> > assume the first year at risk corresponds to age 65, you need rows for
> > each person for each year corresponding to age 65+. As the first survey
> > year (1984 in GSOEP) is after age 65 for most persons, then you
> > would need to create new rows in the data corresponding to those ages
> > before the beginning of the survey. The TIME variable starts with 1 for
> > age 65, then 2 for age 66, and so on. [You would also need to 'spread'
> > values for explanatory variables back onto these new person-year obs.]
> > -expand- could probably be used to create the required data structure,
> > making using of the -if- qualifier to ensure that the correct number of
> > new person-year observations gets generated for each person. (As the
> > respondents were of different ages in 1984, the number of new data rows
> > will differ from person to person.)
> >
>
> Ideally, I would like to use some time-varying variables (e.g. income)
> in the analysis. What would be the appropriate thing to do for these
> variables when I 'spread' them?
You would have to create the appropriate values. Of course the fact
that those new person-year observations are before the start of the
panel may constrain what you are able to create. But in fact if you
make the delayed-entry 'correction' as discussed then the TVCs for
pre-panel years are not needed.
> > Now, to control for the delayed entry aspect and get the likelihood
> > correct, all you need do is create the data structure as just stated,
> > but throw away the person-years corresponding to pre-1984 (first survey
> > year). (Note that the duration counter TIME does not start from 1 in
> > most cases in the delayed-entry version of the data set.)
>
> I am afraid I am still missing something. Please forgive me if this is a
> silly question. If I understand correctly, the only variable I really
> need is the appropriate 'analysis time' counter. I will throw away all the
> records generated through -expand-. Correct?
I was attempting to discuss general principles rather than special
cases, hoping to help understanding. It appears (from a brief glance)
that, given that you already have person-year data for the period
covered by the panel, you will not have to -expand-, and your code
achieves what is required to generate the correct duration counter.
Stephen
----------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.
http://www.iser.essex.ac.uk
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/