Dear Matthias,
basic though I am in dealing with survival analysis, I would try to give a
temptative answer to your question, provided I have understood it well.
The first advice would be to apply Kaplan-Meier survival function to your
dataset, as follows:
---------------------------begin example-----------------------------------
set obs 6
g id=_n
g In=1977 in 1
replace In=1999 in 2
replace In=1980 in 3
replace In=1979 in 4
replace In=1987 in 5
replace In=1982 in 6
g Out=1981 in 1
replace Out=2002 in 2
replace Out=1981 in 3
replace Out=1990 in 4
replace Out=1995 in 5
replace Out=1985 in 6
g faillure =0 in 2
replace faillure =1 if faillure==.
g risk_time=Out-In
stset risk_time, id(id) failure(faillure==1)
sts list
sts graph
-----------------------end example----------------------------------
As far as the second advice is concerned: for more details on this topic, I
would refer you to the following references:
http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/index.php.
Cleves M, Gould W and Gutierrez R. An Introduction to Survival Analysis
Using Stata, 2nd rev ed. College Station, TX: Stata Press.
HTH and Kind Regards,
Carlo
-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Fl�ckiger
Matthias
Inviato: luned� 4 febbraio 2008 9.37
A: [email protected]
Oggetto: st: Correct formatting of survival data
Dear Statalisters
I am currently trying to analyse a data set on firm survival.
I have read up on various sources how to transform the data into the
appropriate survival analysis format.
Unfortunately I don't know anybody familiar with the topic of survival
analysis, so I don't know if what I've done so far is really correct.
If expirienced survival data analysts could have a glance at my approach and
comment that would be great.
Here is a scetch of what my dataset looks like:
id year X failure establishment
1 1981 X11 1 1977
2 2000 X21 0 1999
2 2001 X22 0 1999
2 2002 X23 0 1999
3 1981 X31 1 1980
4 1980 X41 0 1979
4 1981 X42 0 1979
4 1989 X43 0 1979
4 1990 X44 1 1979
5 1992 X45 0 1987
5 1995 X51 1 1987
6 1983 X61 0 1982
6 1984 X62 0 1982
6 1985 X63 1 1982
So there is left truncation, right censoring and possibly gaps within an id.
Continous time analysis:
The commands I used to -snapspan- and -stset- the data set are:
g begin=year-1
snapspan id year failure, g(begin_span) replace
stset year, id(id) time0(begin) origin(time establishment) f(failure)
Am I making any (obvious) mistakes here?
In particular, I am not absolutely sure if my 'time0()' definition is ok.
I've tried to define a variable within the 'snapspanning process'(i.e.
begin_span) but Stata does not recognise the gaps in that case.
Discrete time analysis:
My main question here is whether I can include the firms with gaps into a
cloglog analysis or not (given I brought the data into an appropriate format
for analysing a cloglog model).
Thanks for any tips or comments
Mat
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/