Dear Statalisters
I am currently trying to analyse a data set on firm survival.
I have read up on various sources how to transform the data into the appropriate survival analysis format.
Unfortunately I don't know anybody familiar with the topic of survival analysis, so I don't know if what I've done so far is really correct.
If expirienced survival data analysts could have a glance at my approach and comment that would be great.
Here is a scetch of what my dataset looks like:
id year X failure establishment
1 1981 X11 1 1977
2 2000 X21 0 1999
2 2001 X22 0 1999
2 2002 X23 0 1999
3 1981 X31 1 1980
4 1980 X41 0 1979
4 1981 X42 0 1979
4 1989 X43 0 1979
4 1990 X44 1 1979
5 1992 X45 0 1987
5 1995 X51 1 1987
6 1983 X61 0 1982
6 1984 X62 0 1982
6 1985 X63 1 1982
So there is left truncation, right censoring and possibly gaps within an id.
Continous time analysis:
The commands I used to -snapspan- and -stset- the data set are:
g begin=year-1
snapspan id year failure, g(begin_span) replace
stset year, id(id) time0(begin) origin(time establishment) f(failure)
Am I making any (obvious) mistakes here?
In particular, I am not absolutely sure if my 'time0()' definition is ok. I've tried to define a variable within the 'snapspanning process'(i.e. begin_span) but Stata does not recognise the gaps in that case.
Discrete time analysis:
My main question here is whether I can include the firms with gaps into a cloglog analysis or not (given I brought the data into an appropriate format for analysing a cloglog model).
Thanks for any tips or comments
Mat
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/