Dear stata users,
I would appreciate some advice regarding the stset command.
I am interested in calculating person-years employed at a specific
company. Each person has a start and finish date. However, during this
whole period some people held more than 1 job at the company. I am
also interested in the total person-years per job category.
This is what the data looks like for the first 5 subjects (single-
record data):
---------------------------------------------------------------+
d stdate fndate jdate1 jdate2 jdate3 |
---------------------------------------------------------------|
1 27 Mar 95 31 Jul 02 27 Mar 95 15 Jun 99 15 Feb 02 |
2 27 Jan 92 07 Aug 92 27 Jan 92 . . |
3 02 Oct 89 02 Dec 96 02 Oct 89 19 Jun 92 . |
4 25 Apr 94 09 Aug 96 25 Apr 94 06 Jun 94 04 Jul 94 |
5 17 Aug 98 31 Jul 02 17 Aug 98 . . |
--------------------------------------------------------------|
. stset fndate, fail(fail) origin(stdate) id(code) scale(365.25)
id: code
failure event: fail != 0 & fail < .
obs. time interval: (fndate[_n-1], fndate]
exit on or before: failure
t for analysis: (time-origin)/365.25
origin: time stdate
-----------------------------------------------------------------------
-------
912 total obs.
0 exclusions
-----------------------------------------------------------------------
-------
912 obs. remaining, representing
912 subjects
0 failures in single failure-per-subject data
3735.546 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 13.57153
For the person-years per job category I reshaped the data from wide to
long:
reshape long jdate jm, i(code) j(jobno)
jdate is the date when someone started a new job and jm is the job
category.
After some data manipulation this is what the multiple-record data
looks like for the first 5 subjects:
+-----------------------------+
| id stdate jdate2 |
|-----------------------------|
| 1 27 Mar 95 15jun1999 |
| 1 . 15feb2002 |
| 1 . 31jul2002 |
| 2 27 Jan 92 07aug1992 |
| 3 02 Oct 89 19jun1992 |
|-----------------------------|
| 3 . 02dec1996 |
| 4 25 Apr 94 06jun1994 |
| 4 . 04jul1994 |
| 4 . 09aug1996 |
|-----------------------------|
| 5 17 Aug 98 31jul2002 |
I reset the data with:
. stset jdate2, fail(fail) origin(time stdate) id(code) scale(365.25)
id: code
failure event: fail != 0 & fail < .
obs. time interval: (jdate2[_n-1], jdate2]
exit on or before: failure
t for analysis: (time-origin)/365.25
origin: time stdate
-----------------------------------------------------------------------
-------
1865 total obs.
0 exclusions
-----------------------------------------------------------------------
-------
1865 obs. remaining, representing
912 subjects
0 failures in single failure-per-subject data
3736.758 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 13.57153
Now I have a different overall person-years estimate (total analysis
time at risk).
Can anyone give me some advice how to solve this problem and let me
know what I did wrong?
Kind regards,
Cornelia
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/