Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
AW: st: Using a while loop to compare rows and delete them?
From
KLOSS <[email protected]>
To
"[email protected]" <[email protected]>
Subject
AW: st: Using a while loop to compare rows and delete them?
Date
Thu, 21 Jun 2012 15:09:15 +0200
Dear Nick,
Thank you for your literature hint. I don't know why I haven't thought of "by" earlier! The program is much faster now.
Kind Regards
Michael
-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Mittwoch, 20. Juni 2012 19:55
An: [email protected]
Betreff: Re: st: Using a while loop to compare rows and delete them?
I have not tried to understand your details, but my experience is that neither -while- nor -forvalues- is needed for spell problems.
I'd just like to draw your attention to previous work
SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/07 SJ 7(2):249--265 (no commands)
shows how to handle spells with complete control over
spell specification
-tsspell- from SSC:
tsspell from http://fmwww.bc.edu/RePEc/bocode/t
'TSSPELL': module for identification of spells or runs in time series /
tsspell examines the data, which must be tsset time series, to / identify
spells or runs, which are contiguous sequences defined / by some
condition. tsspell generates new variables indicating / distinct spells,
Nick
On Wed, Jun 20, 2012 at 4:39 PM, KLOSS <[email protected]> wrote:
> Working with some spell data (1 row = 1 episode = 1 observation; 1 spell is subdivided into several "episodes") on employment histories I have the task to identify rows which refer to the same person and the same period (say May 5, 2002 to May 21, 2002). Let's call such observations to be "parallel". I then have to check the employment status given in these parallel observations and compare them to each other. Given some pre-defined rules, one or the other of the parallel observations should be dropped.
>
> This has to be done for all rows in the data set and for all possible combination of parallel observations.
>
> Using STATA/SE 12.0, I start with 20,014,607 rows. I then employ a while loop in order to check all these rules for all observations (see the code below). I know: A while loop is not the fastest way to get results. However, I failed to get a forvalues loop doing the same. So, using said while loop the program has been running way too long. As I interrupted the procedure, exactly 19,999,999 rows remained in the data set.
>
> So, these are my questions:
> (1) Are the 19,999,999 rows I got just pure luck or are they a result of some limit of the while loop?
> (2) Is there any fast lane procedure available for my issue?
>
>
> My code is as follows:
>
> --- CODE START ---
>
> /*
> Data structure: A running spell is subdivided into 2 episodes at the date another spell of the same person (identified via variable "id") begins or ends. The begin and end dates of the original spell are called "begorig" and "endorig" and are written in every episode of this spell. The begin and end dates of an episode are called "begepi" and "endepi". Hence, two episodes are parallel if they show the same id-value and the same begepi-value.
> Within parallel episodes, observations are sorted as: employment (status==1) - training (status==4) - unemployed with benefit (status==5 & benefit==1) - unemployed without benefit (status==5 & benefit==0).
> */
>
> sort id begepi status benefit
>
> local i = 1 // counter
> local N = _N // number of observations
>
> while `i' <`N' {
> local j = `i'+1
> while `j' <=`N' {
> if begepi[`i']!=begepi[`j'] | id[`i']!=id[`j'] { /* consider only parallel episodes */
> local i = `i'+1
> continue, break
> }
> if status[`i']==1 & status[`j']==4 { /* SITUATION 1 */
> drop in `i'
> local N = `N'-1
> continue, break
> }
> if status[`i']==5 & status[`j']==5 & /*
> */ benefit[`i']==1 & benefit[`j']==0 { /* SITUATION 2 */
> drop in `j'
> local N = `N'-1
> continue
> }
> if status[`i']<=4 & status[`j']==5 & /*
> */ begorig[`i']>=begorig[`j'] & endorig[`i']<=endorig[`j'] & /*
> */ endorig[`i']-begorig[`i']<=14 { /* SITUATION 3 */
> drop in `i'
> local N = `N'-1
> continue, break
> }
> if status[`i']<=4 & status[`j']==5 & /*
> */ begorig[`i']<begorig[`j'] & endorig[`i']<=endorig[`j'] & /*
> */ endorig[`i']-begorig[`j']<=14 { /* SITUATION 4 */
> drop in `i'
> local N = `N'-1
> continue, break
> }
> if status[`i']<=4 & status[`j']==5 & /*
> */ begorig[`i']>begorig[`j'] & endorig[`i']>=endorig[`j'] & /*
> */ endorig[`j']-begorig[`i']<=30 { /* SITUATION 5 */
> drop in `j'
> local N = `N'-1
> continue
> }
> local j = `j'+1
> }
> }
>
>
> --- CODE END ---
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
____________
Die ifo Niederlassung Dresden gehoert zum:
ifo Institut - Leibniz-Institut fuer Wirtschaftsforschung an der Universitaet Muenchen e.V.
Poschingerstr. 5, 81679 Muenchen,
Sitz: Muenchen, Vereinsregister-Nr.: 4419, Amtsgericht Muenchen,
Vorstand: Prof. Dr. Dres. h.c. Hans-Werner Sinn (Praesident), Meinhard Knoche;
Steuernummer 143/217/10159, USt-IdNr. DE129516729
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/