Dear Stephen,
Thank you very much for your help. Your notes are very helpful.
As we understand from your reply, there are two options of dealing with
left censoring in our case.
1. Assume that the hazard does not vary with time and drop the time
variable and see what happens.
2. Drop those workers who joined before April 1996.
Shabbar Jaffry
Yaseen Ghulam
st: Survival Analysis Issue
From
"Yaseen Ghulam" <[email protected]
<mailto:[email protected]>>
To
[email protected]
<mailto:[email protected]>
Subject
st: Survival Analysis Issue
Date
Wed, 05 Nov 2003 11:23:40 -0000
Dear Stata users,
Currently we are working on a study which deals with workers
behaviour
in term of leaving the organisation pre maturely before their contract
expires. Particularly, idea is to find who is likely to quit and when by
using the past data. We will appreciate if someone can provide some
help.
The data we have is a typical organisational data. Let me briefly
explain
what data set we have.
In our administrative data set we have persons-month data with
monthly
observations starting from April 1996 till July 2002 (75 monthly spells -
time) for approx.73 thousand workers (3.39m cases) implying that
these
workers came to observation from April 1996 and stayed under
observation till July 2002. Out of these 73 thousand workers during the
observation period roughly 20 thousand quit the organisation
prematurely (20 thousand fail cases). Remaining are right censored.
In the dataset we also have individuals who joined before 1996
(observation window). However, we do not have information on those
who joined before 1996 and left before 1996 (left censoring).
Those who joined after 1996 and either stayed or left (delayed entry)
before the end of
observation period (July 2002) we have a complete data set about
them.
Our data set has the normal job related variables (e.g. what
job they are performing etc.) and demographic variables (e.g. gender,
marital status etc). We have introduced external factors (e.g. no of
vacancies, claimant counts, manufacturing productivity index, inflation
rate, manufacturing sector earning index etc.) into the data set. These
time varying covariates have been merged with the above data set by
calendar months (time).
Our questions are:
1. Can STATA deal with both cases of left and right censoring
and left truncation (delayed entry) simultaneously?
2. Should we be only using those workers who joined after Apr 1996
and
throw away those cases who joined before 1996 (due to left
censoring).
3. We would like to predict which worker is likely to leave and when. It
means calculating probability of failure and expected time of failure for
next few years for right censored workers on the basis of observation
period data (April 1996 to July 2002). If right censored cases are many,
does it effect
the quality of predictions. I suppose these predictions should
be limited to only next 6 years as our observation span is only for 6
years.
Have anybody written any macros or programmes in Stata to carry out
these predictions
by considering the above mentioned issues and type of data we have
using survival
analysis framework?
We highly appreciate the help.
Shabbar Jaffry
Yaseen Ghulam
University of Portsmouth
U.K.
st: RE: Survival Analysis
Issue
From
"Stephen P. Jenkins" <[email protected]
<mailto:[email protected]>>
To
<[email protected]
<mailto:[email protected]>>
Subject
st: RE: Survival Analysis Issue
Date
Thu, 6 Nov 2003 10:18:38 -0000
> -----Original Message-----
> From: [email protected]
> [<mailto:[email protected]>] On Behalf Of
> Yaseen Ghulam
> Sent: 05 November 2003 11:24
> To: [email protected]
> Subject: st: Survival Analysis Issue
>
>
> Dear Stata users,
>
> Currently we are working on a study which deals with workers
> behaviour
> in term of leaving the organisation pre maturely before their
> contract
> expires. Particularly, idea is to find who is likely to quit
> and when by
> using the past data. We will appreciate if someone can
> provide some help.
>
> The data we have is a typical organisational data. Let me
> briefly explain
> what data set we have.
>
> In our administrative data set we have persons-month data
> with monthly
> observations starting from April 1996 till July 2002 (75
> monthly spells -
> time) for approx.73 thousand workers (3.39m cases) implying
> that these
> workers came to observation from April 1996 and stayed under
> observation till July 2002. Out of these 73 thousand workers
> during the
> observation period roughly 20 thousand quit the organisation
> prematurely (20 thousand fail cases). Remaining are right censored.
>
> In the dataset we also have individuals who joined before 1996
> (observation window). However, we do not have information on those
> who joined before 1996 and left before 1996 (left censoring).
>
> Those who joined after 1996 and either stayed or left
> (delayed entry) before the end of
> observation period (July 2002) we have a complete data set
> about them.
>
... snip ...
>
> Our questions are:
>
> 1. Can STATA deal with both cases of left and right censoring
> and left truncation (delayed entry) simultaneously?
> 2. Should we be only using those workers who joined after Apr
> 1996 and
> throw away those cases who joined before 1996 (due to left
> censoring).
You have interval-censored (banded) survival time data, a.k.a. discrete
time data.
for which it is no problem at all to handle left-truncated data combined
with right censoring.
[Have a look at the lecture notes and Stata lessons at
http://www.iser.essex.ac.uk/teaching/stephenj/ec968/index.php]
Left-censored data is more problematic. It's straighforward to handle if
you are prepared to assume that the hazard rate does not vary with
survival time. That's a strong, probably unacceptable, assumption --
but
you might want to see what happens.
Otherwise the standard way of handling the left-censoring is to drop
those spells.
> 3. We would like to predict which worker is likely to leave
> and when. It
> means calculating probability of failure and expected time of
> failure for
> next few years for right censored workers on the basis of
> observation
> period data (April 1996 to July 2002). If right censored
> cases are many, does it effect
> the quality of predictions. I suppose these predictions should
> be limited to only next 6 years as our observation span is
> only for 6
> years.
> Have anybody written any macros or programmes in Stata to
> carry out these predictions
> by considering the above mentioned issues and type of data we
> have using survival
> analysis framework?
If you look at the lessons on discrete time models cited above, you'll
see examples of Stata code showing how to do within-sample and
out-of-sample predictions of the sort that you are asking about.
Stephen
-------------------------------------------------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research
University of Essex, Colchester CO4 3SQ, U.K.
Tel: +44 1206 873374. Fax: +44 1206 873151.
<http://www.iser.essex.ac.uk>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/