On Sun, 1 Dec 2002 03:07:48 -0800 (PST) Enrica Croda
<[email protected]> wrote:
<snip>
> So, to recap, I now believe my data are grouped duration data...
> I understand that in this case I need to organize my data the so-called
> "person-period" form.
> I would appreciate getting feedback on the following:
> My data are already organized by ID and year in "long" panel data
> form (iis ID, tis year) with year = 1984, 1985,...1998.
> A. Do I need to -expand- the data set? Am I correct in thinking
> that I do not? I am thinking I just need to generate the analysis time
> variable, with something like:
> (A1) by ID: generate TIME = _n;
> please see also question B, below.
> B. How do I deal with delayed entry?
> Assuming people first become at risk of not living independently at age 65,
> which may not be the age at which they are first observed in my data,
> how do I incorporate this information in my analysis?
Suppose first that there is no delayed entry -- in which case you would
need a row in the data set corresponding to each year that each person
was /at risk of experiencing the event of interest/. If you were to
assume the first year at risk corresponds to age 65, you need rows for
each person for each year corresponding to age 65+. As the first survey
year (1984 in GSOEP) is after age 65 for most persons, then you
would need to create new rows in the data corresponding to those ages
before the beginning of the survey. The TIME variable starts with 1 for
age 65, then 2 for age 66, and so on. [You would also need to 'spread'
values for explanatory variables back onto these new person-year obs.]
-expand- could probably be used to create the required data structure,
making using of the -if- qualifier to ensure that the correct number of
new person-year observations gets generated for each person. (As the
respondents were of different ages in 1984, the number of new data rows
will differ from person to person.)
Now, to control for the delayed entry aspect and get the likelihood
correct, all you need do is create the data structure as just stated,
but throw away the person-years corresponding to pre-1984 (first survey
year). (Note that the duration counter TIME does not start from 1 in
most cases in the delayed-entry version of the data set.) All this is
discussed in those lecture notes you cited, together with regression
models that you could apply once the data have been created.
> C. Would the solution to question B be different if I plan to control for
> age in the 'regression' analysis?
Given the way you have defined your time-at-risk variable (in terms of
age), wouldn't "age" as an explanatory variable be perfectly correlated
with TIME?
> D. Do I still need to stset the variables?
No. -st- is designed primarily for continuous time duration models. One
can use the -st- utilities to reorganise your data and so on, but that
is a different issue. You don't need -stset- in order to estimate
discrete time duration models.
Stephen
----------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.
http://www.iser.essex.ac.uk
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/