I don't see why your time points will vary by id. Isn't the point to
apply a categorisation consistently to all panels?
Nick
[email protected]
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Thomas
Speidel
Sent: 16 June 2009 16:29
To: [email protected]
Subject: Re: st: RE: Computing and allocating time intervals in a
widedataset
Building on Nick's suggestion, I am trying to modify the code to solve
a slighltly different problem. Suppose again my data is as follows:
id activity start stop event1 event2 event3 event4
1 1 11 18 10 . 38 44
1 2 21 25 10 . 38 44
1 3 25 28 10 . 38 44
1 4 28 32 10 . 38 44
1 5 32 40 10 . 38 44
1 6 40 44 10 . 38 44
2 1 8 18 13 23 . 30
2 2 23 24 13 23 . 30
Except this time instead of having fixed timepoints (i.e. 0.5 17.5
24.5 44.5 64.5 81), I have to use the "event" variables, so that I can
compute the interval between start and stop and allocate that to its
corresponding event:
id activity yr_1 yr_2 yr_3 yr_4
1 1 . . . .
1 2 . . . .
1 3 . . . .
1 4 . . . .
1 5 . . . 2
1 6 . . . 4
2 1 5 5 . .
2 2 . 1 . .
So for example, for (id==2 & activity==1):
yr_1 = min(stop, event1) - max(start, 0.5) = 13 - 8 = 5
yr_2 = min(stop, event2) - max(start, event1) = 23 - 23 = 0 (for
consistency 0 years become 1)
Nick pointed out to tokenize my fixed timepoints in my previous
problem. However, I suspect since now my timepoints vary by id,
tokenize will not be of use here.
Thanks
Thomas Speidel
Quoting Nick Cox <[email protected]> Tue 9 Jun 10:40:40 2009:
> I don't understand the reluctance to -reshape-. I am going to assume
> that you do that.
>
> Your example suggests as code
>
> tokenize 0.5 17.5 24.5 44.5 64.5 81
> qui forval i = 1/5 {
> local j = `i' + 1
> gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''), 0) ///
> if start < . & stop < .
> }
> l
>
> Here are the results:
>
> . l
>
> +------------------------------+
> | id activity start stop |
> |------------------------------|
> 1. | 1 1 6 15 |
> 2. | 1 2 22 25 |
> 3. | 1 3 15 16 |
> 4. | 1 4 22 28 |
> 5. | 1 5 30 . |
> |------------------------------|
> 6. | 1 6 . . |
> 7. | 2 1 53 69 |
> 8. | 2 2 69 79 |
> +------------------------------+
>
> . tokenize 0.5 17.5 24.5 44.5 64.5 81
>
> . qui forval i = 1/5 {
> 2. local j = `i' + 1
> 3. gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''),
0)
> ///
>> if start < . & stop < .
> 4. }
>
> . l
>
>
>
+----------------------------------------------------------------------+
> | id activity start stop grp_1 grp_2 grp_3 grp_4
> grp_5 |
>
>
|----------------------------------------------------------------------|
> 1. | 1 1 6 15 9 0 0 0
> 0 |
> 2. | 1 2 22 25 0 2.5 .5 0
> 0 |
> 3. | 1 3 15 16 1 0 0 0
> 0 |
> 4. | 1 4 22 28 0 2.5 3.5 0
> 0 |
> 5. | 1 5 30 . . . . .
> . |
>
>
|----------------------------------------------------------------------|
> 6. | 1 6 . . . . . .
> . |
> 7. | 2 1 53 69 0 0 0 11.5
> 4.5 |
> 8. | 2 2 69 79 0 0 0 0
> 10 |
>
>
+----------------------------------------------------------------------+
>
> Nick
> [email protected]
>
> Thomas Speidel
>
> I am attempting to compute several time points to calculate the
> interval (years) between the start and the end of an activity and to
> assign that interval to its relevant age group. For example, given
> the following dataset:
>
> id activity start stop
> 1 1 6 15
> 1 2 22 25
> 1 3 15 16
> 1 4 22 28
> 1 5 30 .
> 1 6 . .
> 2 1 53 69
> 2 2 69 79
>
> I am trying to derive the following:
>
> id activity start stop grp_0_17 grp_1~24 grp_2~44
> grp_4~64 grp_6~81
> 1 1 6 15 9 0 0
> 0 0
> 1 2 22 25 0 2.5 .5
> 0 0
> 1 3 15 16 1 0 0
> 0 0
> 1 4 22 28 0 2.5 3.5
> 0 0
> 1 5 30 . 0 0 1
> 0 0
> 1 6 . . . . .
> . .
> 2 1 53 69 0 0 0
> 11.5 4.5
> 2 2 69 79 0 0 0
> 0 10
>
> The age groups are:
> [0.5, 17.5]
> [17.6, 24.5]
> [24.6, 44.5]
> [44.6, 64.5]
> [64.6, 81]
>
> If the dataset was in long format as above, it would not be terribly
> hard. To slightly complicate things is the fact that the interval may
> need to be correctly allocated when it falls between two or more age
> groups. However, my data is in wide format (single observation per
> row) making it a nightmare to even check or troubleshoot my code (I
> have 40 activities per id), and the data is so large that I am
> reluctant to reshape it.
> This is what the dataset above would look like:
>
> id start1 stop1 start2 stop2 start3 stop3 start4
> stop4 start5 stop5 start6 stop6
> 1 6 15 22 25 15 16 22
> 28 30 . . .
> 2 53 69 69 79 . . .
> . . . . .
>
> -The activities do not necessarily follow a temporal sequence (e.g.
> 3rd observation on top)
> -While the example does not show that, every id has exactly 40
> activities, even though many of them may be completing missing.
> -Whenever a start is present but its corresponding stop is missing (as
> in the 6th obs. on top), it means that at the time of the study the
> person was still performing that activity, hence stop would be a
> variable called ageref. If start==ageref, then the interval would be
> approximated as 1 year.
>
> I would appreciate any feedback on how to best tackle this problem.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/