Marjo Pyy-Martikainen <[email protected]> writes,
> I have a data containing multiple spells per person. The spells are measured
> in months. The data is in the following form:
>
> PERSON BEGIN END EVENT DUR
> 1 ( 0 1 ] 1 1
> 1 ( 4 13 ] 1 9
> 2 ( 15 5 ] 0 10
>
> where variables BEGIN and END are measured in calendar time (1 refers to Jan
> 1995, 2 to Feb 1995 and so on until 60, Dec 1999).
>
> I stset the data in the following way:
>
> . stset end, failure(event) time0(begin) exit(time .) origin(time begin)
>
> which means I want to "set the clock to zero" at the start of each spell.
> Now I would like to include a dummy for December months 12, 24, 36 and 48.
> It is thus a time-varying variable getting value 1 for the December months
> and 0 for other months. A spell may include zero, one or many December
> months. I suppose I should use stsplit and do some kind of episode
> splitting, but could someone help me and give me advice how I should do it
> with my data?
I have the solution. Before starting, let's look and see what Marjo Marjo has
already done. At first I thought Marjo had made a mistake, but I was srong.
The -stset- command is just complicated enough that theoretical examination
does not work well; you can check that you have the intended result by listing
the _t0, _t1, and _d variables that -stset- creates. So I entered the data
and typed the -stset- command. Then I typed
. list _t0 _t _d
+---------------+
| _t0 _t _d |
|---------------|
1. | 0 1 1 |
2. | 0 9 1 |
3. | 0 10 0 |
+---------------+
Analysis time ranges over (0,1] and again over (0,9] for the first person.
That's what I thought would happen, and it looks like an error, but notice
that Marjo said, "which means I want to set the clock to zero at the start of
each spell". Okay, the command works exactly as Marjo said it would.
Marjo now wants to add a dummy variable equal to 1 every December.
Without explanation (that's coming), here's the solution:
. gen recid = _n (1)
. stset end, id(person) failure(event) enter(begin) /// (2)
exit(time .) time0(begin)
. stsplit bot, at(12 13 24 25 36 38 48 49) (3)
. gen dummy = ( mod(bot,12)==0 & bot!=0 ) (4)
. stset end, id(recid) failure(event) time0(begin) /// (5)
exit(time .) origin(time begin)
I admit that the entire solution did not occur to me at the out. In fact, I
went back and added first line at the end, and modified the fifth. Here is
what did occur to me: We will have to use -stsplit-. -stplit- wants to split
on analysis time, so we will have first to -stset- our data based on calendar
time, then -stsplit- the data, and finally we can -stset- our data the way we
really want it. The preliminary -stset- would allow us to generate the
dummy variable for December.
So let me explain.
Ignore line (1); remember, it didn't even occur to me until later.
Line (2) was the first line I wrote. It seemed the right way to -stset-
the data based on calendar time. I didn't get the command right the
first time, but after typing (2), I listed the data, saw what was wrong,
and eventually got (2) to work just as I wanted it. (What was wrong is that I
forgot exit(time .) because this data, it turned out, had to be treated as
multiple-failure data at this step. When I say listed the data, what I do is
list _t0, _t, and _d, so I can the time variables and outcome that will be
used in analysis. Here's what the data looked like after (2):
. list person _t0 _t _d
+------------------------+
| person _t0 _t _d |
|------------------------|
1. | 1 0 1 1 |
2. | 1 4 13 1 |
3. | 2 15 25 0 |
+------------------------+
Pefect; _t0 and _t correspond to the original month variables.
Now we can -stsplit-. We need to set the dummy to 1 for months 12, 24, 36,
and 48, which means we need to set it back to 0 for months 13, 25, 37, and
49. So I -stsplit- the data as 12, 13, 24, 25, 36, 37, 48, and 49 and
created the dummy variable. I checked results after executing commands (3)
and (4):
. list person dummy _t0 _t _d
+--------------------------------+
| person dummy _t0 _t _d |
|--------------------------------|
1. | 1 0 0 1 1 |
2. | 1 0 4 12 0 |
3. | 1 1 12 13 1 |
4. | 2 0 15 24 0 |
5. | 2 1 24 25 0 |
+--------------------------------+
Actually, I check results after command (3), and I created the dummy
more inefficiently (using two commands) on my first take, but that's
irrelevant. We have what we want in terms of how the data are split.
Now we need to reset analysis time to be as we really want it. So first,
I just typed the original -stset- command Marjo supplied,
> . stset end, failure(event) time0(begin) exit(time .) origin(time begin)
I listed the data, but that didn't work. What I found was that
the original second record, calendar time (4,13] and desired analysis time
(0,9] was now itself split into two parts, and analysis time got reset
on the second part. Well, of course. Marjo was treating this data as
single-record survival data, but after the -stsplit-, what was single record
data was no longer. So I went back and added command (1), and then
I could set what were (but are no longer) single records by specifying
id(recnum). That worked. Here was the final result:
. list person dummy _t0 _t _d
+--------------------------------+
| person dummy _t0 _t _d |
|--------------------------------|
1. | 1 0 0 1 1 |
2. | 1 0 4 12 0 |
3. | 1 1 12 13 1 |
4. | 2 0 15 24 0 |
5. | 2 1 24 25 0 |
+--------------------------------+
I think that's what Marjo wants.
I admit that this was a conceptually difficult problem, so let me emphasize
two things: First, to achieve a desired result, you can -stset- the data one
way, and then later -stset- the data differently for analysis. That was the
insight that had not occurred to Marjo. It is a trick worth remembering
whenever working with data where you want some variables defined on one
time scale (say months) and others on another (say analysis time).
-stset- based on calendar months, create what you want, and then -stset-
the data the way you really want it.
The rest was just work. I admit that I seldom get an -stset- command
right the first time. My technique is to guess and list. Looking at
the result, I go back and improve my guess, and eventually I get it
right.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/