Hi Bill,
thanks for your beautiful solution -it works perfectly.
A small remark. Shouldn't the lines (3) and (4) be replaced by
the following lines:
. stsplit bot, at(11 12 23 24 35 36 47 48) (3)
. gen dummy=(mod(_t,12)==0) (4)
because isn't it the interval (11,12] that contains
December and not the interval (12,13]?
-Marjo
William Gould,StataCorp LP (8.4.2008 19:58):
>Marjo Pyy-Martikainen <[email protected]> writes,
>
>> I have a data containing multiple spells per person. The spells are measured
>> in months. The data is in the following form:
>>
>> PERSON BEGIN END EVENT DUR
>> 1 ( 0 1 ] 1 1
>> 1 ( 4 13 ] 1 9
>> 2 ( 15 5 ] 0 10
>>
>> where variables BEGIN and END are measured in calendar time (1 refers to Jan
>> 1995, 2 to Feb 1995 and so on until 60, Dec 1999).
>>
>> I stset the data in the following way:
>>
>> . stset end, failure(event) time0(begin) exit(time .) origin(time begin)
>>
>> which means I want to "set the clock to zero" at the start of each spell.
>> Now I would like to include a dummy for December months 12, 24, 36 and 48.
>> It is thus a time-varying variable getting value 1 for the December months
>> and 0 for other months. A spell may include zero, one or many December
>> months. I suppose I should use stsplit and do some kind of episode
>> splitting, but could someone help me and give me advice how I should do it
>> with my data?
>
>I have the solution. Before starting, let's look and see what Marjo Marjo has
>already done. At first I thought Marjo had made a mistake, but I was srong.
>The -stset- command is just complicated enough that theoretical examination
>does not work well; you can check that you have the intended result by listing
>the _t0, _t1, and _d variables that -stset- creates. So I entered the data
>and typed the -stset- command. Then I typed
>
> . list _t0 _t _d
>
> +---------------+
> | _t0 _t _d |
> |---------------|
> 1. | 0 1 1 |
> 2. | 0 9 1 |
> 3. | 0 10 0 |
> +---------------+
>
>Analysis time ranges over (0,1] and again over (0,9] for the first person.
>That's what I thought would happen, and it looks like an error, but notice
>that Marjo said, "which means I want to set the clock to zero at the start of
>each spell". Okay, the command works exactly as Marjo said it would.
>
>Marjo now wants to add a dummy variable equal to 1 every December.
>Without explanation (that's coming), here's the solution:
>
> . gen recid = _n (1)
>
> . stset end, id(person) failure(event) enter(begin) /// (2)
> exit(time .) time0(begin)
> . stsplit bot, at(12 13 24 25 36 38 48 49) (3)
> . gen dummy = ( mod(bot,12)==0 & bot!=0 ) (4)
>
> . stset end, id(recid) failure(event) time0(begin) /// (5)
> exit(time .) origin(time begin)
>
>I admit that the entire solution did not occur to me at the out. In fact, I
>went back and added first line at the end, and modified the fifth. Here is
>what did occur to me: We will have to use -stsplit-. -stplit- wants to split
>on analysis time, so we will have first to -stset- our data based on calendar
>time, then -stsplit- the data, and finally we can -stset- our data the way we
>really want it. The preliminary -stset- would allow us to generate the
>dummy variable for December.
>
>So let me explain.
>Ignore line (1); remember, it didn't even occur to me until later.
>
>Line (2) was the first line I wrote. It seemed the right way to -stset-
>the data based on calendar time. I didn't get the command right the
>first time, but after typing (2), I listed the data, saw what was wrong,
>and eventually got (2) to work just as I wanted it. (What was wrong is that I
>forgot exit(time .) because this data, it turned out, had to be treated as
>multiple-failure data at this step. When I say listed the data, what I do is
>list _t0, _t, and _d, so I can the time variables and outcome that will be
>used in analysis. Here's what the data looked like after (2):
>
> . list person _t0 _t _d
>
> +------------------------+
> | person _t0 _t _d |
> |------------------------|
> 1. | 1 0 1 1 |
> 2. | 1 4 13 1 |
> 3. | 2 15 25 0 |
> +------------------------+
>
>Pefect; _t0 and _t correspond to the original month variables.
>Now we can -stsplit-. We need to set the dummy to 1 for months 12, 24, 36,
>and 48, which means we need to set it back to 0 for months 13, 25, 37, and
>49. So I -stsplit- the data as 12, 13, 24, 25, 36, 37, 48, and 49 and
>created the dummy variable. I checked results after executing commands (3)
>and (4):
>
> . list person dummy _t0 _t _d
>
> +--------------------------------+
> | person dummy _t0 _t _d |
> |--------------------------------|
> 1. | 1 0 0 1 1 |
> 2. | 1 0 4 12 0 |
> 3. | 1 1 12 13 1 |
> 4. | 2 0 15 24 0 |
> 5. | 2 1 24 25 0 |
> +--------------------------------+
>
>Actually, I check results after command (3), and I created the dummy
>more inefficiently (using two commands) on my first take, but that's
>irrelevant. We have what we want in terms of how the data are split.
>Now we need to reset analysis time to be as we really want it. So first,
>I just typed the original -stset- command Marjo supplied,
>
>> . stset end, failure(event) time0(begin) exit(time .) origin(time begin)
>
>I listed the data, but that didn't work. What I found was that
>the original second record, calendar time (4,13] and desired analysis time
>(0,9] was now itself split into two parts, and analysis time got reset
>on the second part. Well, of course. Marjo was treating this data as
>single-record survival data, but after the -stsplit-, what was single record
>data was no longer. So I went back and added command (1), and then
>I could set what were (but are no longer) single records by specifying
>id(recnum). That worked. Here was the final result:
>
> . list person dummy _t0 _t _d
>
> +--------------------------------+
> | person dummy _t0 _t _d |
> |--------------------------------|
> 1. | 1 0 0 1 1 |
> 2. | 1 0 4 12 0 |
> 3. | 1 1 12 13 1 |
> 4. | 2 0 15 24 0 |
> 5. | 2 1 24 25 0 |
> +--------------------------------+
>
>I think that's what Marjo wants.
>
>I admit that this was a conceptually difficult problem, so let me emphasize
>two things: First, to achieve a desired result, you can -stset- the data one
>way, and then later -stset- the data differently for analysis. That was the
>insight that had not occurred to Marjo. It is a trick worth remembering
>whenever working with data where you want some variables defined on one
>time scale (say months) and others on another (say analysis time).
>-stset- based on calendar months, create what you want, and then -stset-
>the data the way you really want it.
>
>The rest was just work. I admit that I seldom get an -stset- command
>right the first time. My technique is to guess and list. Looking at
>the result, I go back and improve my guess, and eventually I get it
>right.
>
>-- Bill
>[email protected]
>*
>* For searches and help try:
>* http://www.stata.com/support/faqs/res/findit.html
>* http://www.stata.com/support/statalist/faq
>* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/