Dear Stata users,
Currently we are working on a study which deals with workers behaviour
in term of leaving the organisation pre maturely before their contract
expires. Particularly, idea is to find who is likely to quit and when by
using the past data. We will appreciate if someone can provide some help.
The data we have is a typical organisational data. Let me briefly explain
what data set we have.
In our administrative data set we have persons-month data with monthly
observations starting from April 1996 till July 2002 (75 monthly spells -
time) for approx.73 thousand workers (3.39m cases) implying that these
workers came to observation from April 1996 and stayed under
observation till July 2002. Out of these 73 thousand workers during the
observation period roughly 20 thousand quit the organisation
prematurely (20 thousand fail cases). Remaining are right censored.
In the dataset we also have individuals who joined before 1996
(observation window). However, we do not have information on those
who joined before 1996 and left before 1996 (left censoring).
Those who joined after 1996 and either stayed or left (delayed entry) before the end of
observation period (July 2002) we have a complete data set about them.
Our data set has the normal job related variables (e.g. what
job they are performing etc.) and demographic variables (e.g. gender,
marital status etc). We have introduced external factors (e.g. no of
vacancies, claimant counts, manufacturing productivity index, inflation
rate, manufacturing sector earning index etc.) into the data set. These
time varying covariates have been merged with the above data set by
calendar months (time).
Our questions are:
1. Can STATA deal with both cases of left and right censoring
and left truncation (delayed entry) simultaneously?
2. Should we be only using those workers who joined after Apr 1996 and
throw away those cases who joined before 1996 (due to left censoring).
3. We would like to predict which worker is likely to leave and when. It
means calculating probability of failure and expected time of failure for
next few years for right censored workers on the basis of observation
period data (April 1996 to July 2002). If right censored cases are many, does it effect
the quality of predictions. I suppose these predictions should
be limited to only next 6 years as our observation span is only for 6
years.
Have anybody written any macros or programmes in Stata to carry out these predictions
by considering the above mentioned issues and type of data we have using survival
analysis framework?
We highly appreciate the help.
Shabbar Jaffry
Yaseen Ghulam
University of Portsmouth
U.K.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/