Dear Statalisters
I have send a thread previously on expanding observations when the total number of observations (n) varies.
Thanks to Joseph Coveney and Maarten Buis, I think I'm closer to the data manipulation that I need to do before further statistical analyses.
In summary
After generating a dummy variable (fail =1 ) , I -stset- my data ( a multiple-record-per subject where subjects are followed throughout their work history) as follow
stset enddate, id(studyno1) failure(fail) enter(time startdat) exit(time .) scale(365.25)
Then I -stsplit- as follow
stsplit year,at(0(1)max)
The result that I obtain is not quite what I need STATA to do. Here are the modifications that happened in one of the 14,300 subjects
Original data (declared as survival-time)
+------------------------------------------------------------------------------------------+
studyno1 startdat enddate jobdur~n _st _d _t _t0
|------------------------------------------------------------------------------------------|
100091 29 Mar 77 20 Jul 80 1209 1 1 3.3100616 0
100091 21 Jul 80 09 Jan 81 172 1 1 3.7837098 3.3100616
100091 . . . 0 . . . .
+------------------------------------------------------------------------------------------+
Transformed data ( after stsplit)
+--------------------------------------------------------------------------------+
studyno1 startdat enddate jobdur~n _st _d _t _t0
|--------------------------------------------------------------------------------|
100091 29 Mar 77 31 Dec 77 1209 1 0 18 17.240246
100091 29 Mar 77 31 Dec 78 1209 1 0 19 18
100091 29 Mar 77 01 Jan 80 1209 1 0 20 19
100091 29 Mar 77 20 Jul 80 1209 1 1 20.550308 20
100091 21 Jul 80 31 Dec 80 172 1 0 21 20.550308
100091 21 Jul 80 09 Jan 81 172 1 1 21.023956 21
100091 . . . 0 . . .
+--------------------------------------------------------------------------------+
As one can see, only job end dates are truncated and not the job start dates.
Also, the splitting does not always end at the end of the calendar year as can be seen in the 3rd record (where enddate = 01 Jan 80 ).
I fixed the start dates with :
by studyno1: replace startdat=enddate[_n+1] if jobduration==jobduration[_n-1]
However I don't know how to tackle the scale issue that, I believe, causes the splitting to sometimes end on the January 1st year+1 rather than December 31st year.
Any help would be greatly appreciated.
Thank you for your time
Hind Sbihi
School of Occupational and Environmental Hygiene
University of British Columbia
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/