Looks a horrible problem.
Ken Thompson is quoted in a recent book
on Unix as saying, "When in doubt, use brute
force". In the same spirit, here is an
untested code sketch that may help.
Watch out for wrapped lines.
gen Month = month(date)
levels Month, local(M)
levels Id, local(I)
generate Y = .
foreach i of local I {
qui foreach m of local M {
local l = `m' - 1
local n = `m' + 1
* next statement split over three lines
levels Type ///
if Id == `i' & inlist(Month, `l',`m',`n'), ///
local(T) clean
replace Y = 0 if "`T'" == "" & id == `i' & Month == `m'
replace Y = 1 if "`T'" == "CAB" & id == `i' & Month == `m'
replace Y = 2 if "`T'" == "OXY" & id == `i' & Month == `m'
replace Y = 3 if "`T'" == "OTH" & id == `i' & Month == `m'
replace Y = 4 if "`T'" == "CAB OXY" & id == `i' & Month == `m'
replace Y = 5 if "`T'" == "CAB OTH" & id == `i' & Month == `m'
replace Y = 6 if "`T'" == "OTH OXY" & id == `i' & Month == `m'
replace Y = 7 if "`T'" == "CAB OTH OXY" & id == `i' & Month == `m'
}
}
Then remove duplicates within -Id Month-.
If you have too many values of Id for the -levels- approach
to work, check out the -egen, group()- approach documented
under http://www.stata.com/support/faqs/data/foreach.html
Nick
[email protected]
Anthony Gichangi
> I have a dataset with three variables ID Dates and Type. I
> have listed the
> data
> for subject number 1
> ID Date Type
> 1 05 Jan 96 CAB
> 1 05 Jan 96 OTH
> 1 02 Feb 96 CAB
> 1 11 Mar 96 CAB
> 1 15 Apr 96 CAB
> 1 15 Apr 96 OTH
> 1 23 Jul 96 CAB
> 1 23 Jul 96 CAB
> 1 02 Sep 96 OXY
> 1 02 Sep 96 CAB
> 1 30 Sep 96 OTH
> 1 01 Nov 96 OTH
> 1 22 Nov 96 OXY
> 1 22 Nov 96 CAB
> 1 16 Dec 96 OXY
> 1 16 Dec 96 CAB
> 1 16 Dec 96 OTH
>
>
> Now I have defined three months overlapping interval with
> two months overlap
> in which I want to
> see what is happening to the subjects. So that Interval
> number 1 is Jan Feb
> March, number 2 is Feb
> march April and No3 is March April May e.t.c until Dec.
> Then define a new
> variable Y(t) which captures
> the information in each interval as follows
>
> Y(t) =0 if nothing
> = 1 if CAB only
> = 2 if OXY only
> = 3 if OTHERS only
> = 4 if OXY+CAB
> = 5 if CAB+OTHERS
> = 6 if OXY+OTHERS
> = 7 if OXY+CAB+OTHERS
>
> Then the final dataset should look like this
> ID interval Y(t)
> 1 1 5
> 1 2 1
> 1 3 5
> .
> .
> .
> Any ideas how I can do this in stata ?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/