> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Yaseen Ghulam
> Sent: 05 November 2003 11:24
> To: [email protected]
> Subject: st: Survival Analysis Issue
>
>
> Dear Stata users,
>
> Currently we are working on a study which deals with workers
> behaviour
> in term of leaving the organisation pre maturely before their
> contract
> expires. Particularly, idea is to find who is likely to quit
> and when by
> using the past data. We will appreciate if someone can
> provide some help.
>
> The data we have is a typical organisational data. Let me
> briefly explain
> what data set we have.
>
> In our administrative data set we have persons-month data
> with monthly
> observations starting from April 1996 till July 2002 (75
> monthly spells -
> time) for approx.73 thousand workers (3.39m cases) implying
> that these
> workers came to observation from April 1996 and stayed under
> observation till July 2002. Out of these 73 thousand workers
> during the
> observation period roughly 20 thousand quit the organisation
> prematurely (20 thousand fail cases). Remaining are right censored.
>
> In the dataset we also have individuals who joined before 1996
> (observation window). However, we do not have information on those
> who joined before 1996 and left before 1996 (left censoring).
>
> Those who joined after 1996 and either stayed or left
> (delayed entry) before the end of
> observation period (July 2002) we have a complete data set
> about them.
>
... snip ...
>
> Our questions are:
>
> 1. Can STATA deal with both cases of left and right censoring
> and left truncation (delayed entry) simultaneously?
> 2. Should we be only using those workers who joined after Apr
> 1996 and
> throw away those cases who joined before 1996 (due to left
> censoring).
You have interval-censored (banded) survival time data, a.k.a. discrete
time data.
for which it is no problem at all to handle left-truncated data combined
with right censoring.
[Have a look at the lecture notes and Stata lessons at
http://www.iser.essex.ac.uk/teaching/stephenj/ec968/index.php]
Left-censored data is more problematic. It's straighforward to handle if
you are prepared to assume that the hazard rate does not vary with
survival time. That's a strong, probably unacceptable, assumption -- but
you might want to see what happens.
Otherwise the standard way of handling the left-censoring is to drop
those spells.
> 3. We would like to predict which worker is likely to leave
> and when. It
> means calculating probability of failure and expected time of
> failure for
> next few years for right censored workers on the basis of
> observation
> period data (April 1996 to July 2002). If right censored
> cases are many, does it effect
> the quality of predictions. I suppose these predictions should
> be limited to only next 6 years as our observation span is
> only for 6
> years.
> Have anybody written any macros or programmes in Stata to
> carry out these predictions
> by considering the above mentioned issues and type of data we
> have using survival
> analysis framework?
If you look at the lessons on discrete time models cited above, you'll
see examples of Stata code showing how to do within-sample and
out-of-sample predictions of the sort that you are asking about.
Stephen
-------------------------------------------------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research
University of Essex, Colchester CO4 3SQ, U.K.
Tel: +44 1206 873374. Fax: +44 1206 873151.
http://www.iser.essex.ac.uk
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/