Joao Pedro W. de Azevedo
> I'm combining a few different datasets (a, b and c), and I have to
do some
> recoding using date criterias. I know that although the date spells
of "b"
> and "c" can overlap, they can not overlap with "a". Moreover, I know
that
> start dates are more reliable than end dates. So, in the case of
serial
> number 58369 what I want to do is to substitue the end date of
databases "a"
> and "b", with the day before the stat date of the last spell of
database
> "a". In other words, I would like to generate a loop that would
substitute
> 21 Sep 01 and 01 Sep 03 by 27 Aug 01.
>
>
> Original database
> serial data start2 end2
> 58369 c 27 Jul 99 21 Sep 01
> 58369 b 28 Apr 97 01 Sep 03
> 58369 a 30 Oct 95 23 Jan 96
> 58369 a 04 Mar 96 06 May 96
> 58369 a 09 Dec 96 21 Apr 97
> 58369 a 28 Aug 01 01 Sep 03
> 77876 c 21 Oct 98 23 Jan 03
> 77876 b 18 Mar 02 01 Sep 03
> 77876 a 14 Jan 03 06 Jul 03
>
> Final database
> serial data start2 end2 end3
> 58369 c 27 Jul 99 21 Sep 01 27 Aug 01
> 58369 b 28 Apr 97 01 Sep 03 27 Aug 01
> 58369 a 30 Oct 95 23 Jan 96 .
> 58369 a 04 Mar 96 06 May 96 .
> 58369 a 09 Dec 96 21 Apr 97 .
> 58369 a 28 Aug 01 01 Sep 03 .
> 77876 c 21 Oct 98 23 Jan 03 13 Jan 03
> 77876 b 18 Mar 02 01 Sep 03 13 Jan 03
> 77876 a 14 Jan 03 06 Jul 03 .
>
>
> I was thinking about using some sort of looping structure, which
would firt
> identify the serial number, and then loop through all combinations
for each
> case.
>
> While i<=_N (for the number of cases of each serial number
> ** fix overlaping A claims with C
> replace end3=start2[_n+`i']-1 if end3==. &
> serial[_n]==serial[_n+`i'] & data[_n]=="a" & data[_n+`i']=="c" &
start2
> [_n+`i']>start2[_n] & end2[_n]<=start2[_n+`i']
> ** fix overlaping A claims with B
> replace end3=start2[_n+`i']-1 if end3==. &
> serial[_n]==serial[_n+`i'] & data[_n]=="a" & data[_n+`i']=="b" &
start2
> [_n+`i']>start2[_n] & end2[_n]<=start2[_n+`i']
> local i=`i'+1
> }
Your word description doesn't seem to match your example exactly, and
I'm not sure I've grasped all of this, but some techniques here should
help.
!!! The key point is that this kind of problem falls out really nicely
with -by- and the right -sort- order. !!!
Get the last date if "a", within each serial number:
egen end3 = max(start2) if data == "a", by(serial)
except that we want the day before
replace end3 = end3 - 1
Copy this across to "b" and "c"
bysort serial (end3) : replace end3 = end3[_n-1] if mi(end3)
and drop the original:
replace end3 = . if data == "a"
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/