Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Gerard Solbrig" <gsolbrig@mail.uni-mannheim.de> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Looping within a subset under a certain condition |
Date | Sun, 30 Sep 2012 20:28:40 +0200 |
I'm sorry, but I've been trying for hours now: Stata yields me "invalid syntax r(198);" every time I try to run this code: sort cusip6 rep date gen obs = _n gen rep_ins = 0 egen firm_numid = group(cusip6) summarize firm_numid, meanonly forvalues x = 1/`r(max)' { su obs if firm_numid == `x' & rep == 0, meanonly local z1 = r(min) local z2 = r(max) su obs if firm_numid == `x' & rep == 1, meanonly local o1 = r(min) local o2 = r(max) forvalues i = `z1'/`z2' { local isin = 1 forvalues o = `o1'/`o2' { if inrange(trandate[`i'], wind_start[`o'], wind_end[`o']) { local isin = 0 } if `isin' == 1 replace rep_ins = 1 in `i' } } } Despite countless tries and modifications, I cannot find the mistake in the syntax. I simply don't know what is supposed to be wrong here. I know this code should be working the way I need it... Many thanks in advance. Gerard -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Sonntag, 30. September 2012 15:36 To: statalist@hsphsun2.harvard.edu Subject: Re: st: Looping within a subset under a certain condition Also, you can jump out of the loops if you like with -continue- statements. Nick On Sun, Sep 30, 2012 at 2:31 PM, Nick Cox <njcoxstata@gmail.com> wrote: > The code is testing whether every case of 0 is within all the windows > defined by cases of 1, which I thought was what you wanted. > > That is not what you want, it seems. > > If you are happy that a case of 0 is within at least one of the > windows defined by cases of 1, then the code is different. > > sort firm rep trandate > > gen long obsno = _n > > * assume not in a window; will change our mind if we find an exception > gen in_a_window = 0 > > * numeric ids 1 2 3 ... are just a convenience for looping egen > firm_numid = group(firm_id) su firm_numid, meanonly > > * loop over firms > forval f = 1/`r(max)' { > > * within each firm, which cases have rep == 0 su obsno if firm_numid > == `f' & rep == 0, meanonly local z1 = r(min) local z2 = r(max) > > * ditto, rep == 1 > su obsno if firm_numid == `f' & rep == 1, meanonly local o1 = r(min) > local o2 = r(max) > > * look at each case of rep == 0 > forval i = `z1'/`z2' { > local isin = 0 > > * we use the -trandate[`i'] and compare it with the > windows for each case of rep == 1 > forval o = `o1'/`o2' { > if inrange[trandate[`i'], win_start[`o'], win_end[`o']) { > local isin = 1 > } > } > > if `isin' replace in_a_window = 1 in `i' > } > > If you then want to check that _all_ cases of rep==0 for each firm_id > are within a window > > egen all_in_window = min(in_a_window / (rep == 0)) , by(firm_id) > > Nick > > On Sun, Sep 30, 2012 at 2:05 PM, Gerard Solbrig > <gsolbrig@mail.uni-mannheim.de> wrote: >> (in reference to my mails before, concerning your and my code) >> >> I have given this some thought, why -rep_ins- is set to 0 for all >> observations, using your code. >> >> The loop runs over all rep = 1 cases and looks into whether the >> -trandate- lies within the range of each rep = 1 case. >> In case of multiple rep = 1 cases with very different dates, it might >> find one rep = 1 case in which's range the current rep = 0 >> observation's >> -trandate- lies. But the loop does not stop there, if it does find one. >> It keeps on going and due to the sorting of dates, it inevitably >> finds a later rep = 1 case, for which its -trandate- lies outside of >> the range and changes -rep_ins- to 0. >> >> Is there a way to tell the loop: stop as soon as you find that your >> -trandate- lies in the range of a (or any) rep = 1 case and jump on >> to the next rep = 0 case? If not, a loop might not even be the >> approach to this problem... >> >> Gerard >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >> Sent: Sonntag, 30. September 2012 12:48 >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: Looping within a subset under a certain condition >> >> Should be >> >> sort firm rep trandate >> >> Sorry! >> >> On Sun, Sep 30, 2012 at 11:27 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>> You are not showing me the complete line you typed, so I can't tell >>> you what was wrong exactly. >>> >>> More positively, here is a stab at your problem, but I haven't >>> tested the >> code. >>> >>> sort firm trandate rep >>> >>> gen long obsno = _n >>> >>> * assume all are in some window; will change our mind if we find an >>> exception gen all_in_a_window = 1 >>> >>> * numeric ids 1 2 3 ... are just a convenience for looping egen >>> firm_numid = group(firm_id) su firm_numid, meanonly >>> >>> * loop over firms >>> forval f = 1/`r(max)' { >>> >>> * within each firm, which cases have rep == 0 su obsno if firm_numid >>> == `f' & rep == 0, meanonly local z1 = r(min) local z2 = r(max) >>> >>> * ditto, rep == 1 >>> su obsno if firm_numid == `f' & rep == 1, meanonly local o1 = r(min) >>> local o2 = r(max) >>> >>> * look at each case of rep == 0 >>> forval i = `z1'/`z2' { >>> local allin = 1 >>> >>> * we use the -trandate[`i'] and compare it with the >>> windows for each case of rep == 1 >>> * note the crucial ! [!!!] >>> forval o = `o1'/`o2' { >>> if !inrange[trandate[`i'], win_start[`o'], win_end[`o']) { >>> local allin = 0 >>> } >>> } >>> >>> if `allin' == 0 replace all_in_window = 0 in `i' >>> } >>> >>> } >>> >>> Nick >>> >>> On Sun, Sep 30, 2012 at 11:17 AM, Gerard Solbrig >>> <gsolbrig@mail.uni-mannheim.de> wrote: >>>> I understand. That's what I did in an earlier version of the loop, >>>> where I subscripted both, -rep- and -trandate- in my loop, but then >>>> Stata >> returned: >>>> >>>> '[' invalid obs no >>>> r(198); >>>> >>>> Why is that? That's why I got rid of it in the first place. But >>>> without the subscript, the loop does not seem to finish running. >>>> >>>> >>>> -----Original Message----- >>>> From: owner-statalist@hsphsun2.harvard.edu >>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >>>> Sent: Sonntag, 30. September 2012 11:59 >>>> To: statalist@hsphsun2.harvard.edu >>>> Subject: Re: st: Looping within a subset under a certain condition >>>> >>>> This can't be right, if only because you are misunderstanding what >>>> the >>>> -if- command does. Stata treats >>>> >>>> if rep == 1 >>>> >>>> as if it were >>>> >>>> if rep[1] == 1 >>>> >>>> See >>>> >>>> FAQ . . . . . . . . . . . . . . . . . . . . . if command vs. if >>>> qualifier >>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. >>>> Wernow >>>> 6/00 I have an if command in my program that only seems >>>> to evaluate the first observation, what's going on? >>>> >>>> http://www.stata.com/support/faqs/lang/ifqualifier.html >>>> >>>> The context of looping over observations makes no difference here. >>>> You probably intend >>>> >>>> if rep[`i'] == 1 >>>> >>>> Similar comment w.r.t. >>>> >>>> if trandate ... >>>> >>>> where -trandate- _must_ be subscripted. >>>> >>>> >>>> On Sun, Sep 30, 2012 at 10:18 AM, Gerard Solbrig >>>> <gsolbrig@mail.uni-mannheim.de> wrote: >>>>> That sure is correct. Please see my reply to Pengpeng on that matter. >>>>> So far, I've only focused on getting the rep_ins indicator to work >>>>> at all, but multiple windows for one firm is an additional concern. >>>>> Ideally, a code would indicate for each rep = 0 case within which >>>>> of these windows the observation's 'trandate' lies... >>>>> >>>>> Here's the last version of my code (without inclusion of your >>>>> earlier suggestion and the multiple window problem): >>>>> >>>>> forvalues x = 1/`max' { >>>>> summarize obs, meanonly >>>>> local N = r(N) >>>>> forvalues i = 1/`N' { >>>>> if rep == 1 { >>>>> local r = `i' >>>>> local s = `i'+1 >>>>> forvalues z = `s'/`N' { >>>>> if trandate >= wind_start[`r'] & trandate >>>>> <= wind_end[`r'] { >>>>> replace rep_ins = 1 in [`z'] >>>>> } >>>>> else { >>>>> replace rep_ins = 0 in [`z'] >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> replace rep_ins = . if rep == 1 >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: owner-statalist@hsphsun2.harvard.edu >>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick >>>>> Cox >>>>> Sent: Sonntag, 30. September 2012 11:10 >>>>> To: statalist@hsphsun2.harvard.edu >>>>> Subject: Re: st: Looping within a subset under a certain condition >>>>> >>>>> The other thing I wasn't clear on your rules for combining two or >>>>> more windows for the same firm. The code example I gave just uses >>>>> the overall range of the windows, but that would include any gaps >>>>> between windows. Thus if a < b < c < d and there are windows [a,b] >>>>> and [c,d] then the combined window [a, d] includes a gap [b, c]. >>>>> >>>>> On Sun, Sep 30, 2012 at 9:56 AM, Gerard Solbrig >>>>> <gsolbrig@mail.uni-mannheim.de> wrote: >>>>>> My bad, sorry! Of course, the observation 5apr2004 should not be >>>>>> considered in the window, as it lies outside of the range between >>>>>> 'wind_start' and 'wind_end'. Despite, it seems you've understood >>>>>> my >>>>> problem correctly. >>>>>> >>>>>> I'll try to incorporate your suggestion into a solution and see >>>>>> whether it helps finding a solution. I will post an update on the >>>>>> matter >>>>> later. >>>>>> >>>>>> Thanks so far! >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: owner-statalist@hsphsun2.harvard.edu >>>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick >>>>>> Cox >>>>>> Sent: Sonntag, 30. September 2012 01:13 >>>>>> To: statalist@hsphsun2.harvard.edu >>>>>> Subject: Re: st: Looping within a subset under a certain >>>>>> condition >>>>>> >>>>>> I had another look at this. I still don't understand your problem >>>>>> exactly (e.g. why is the second obs at 5apr2004 considered in >>>>>> window), but the technique here may help. >>>>>> >>>>>> egen first_start = min(wind_start), by(firm_id) egen last_end = >>>>>> max(wind_end), by(firm_id) >>>>>> >>>>>> gen in_window = inrange(date, first_start, last_end) >>>>>> >>>>>> egen all_0_in_window = min(in_window) if rep == 0, by(firm_id) >>>>>> >>>>>> On the last line: on all <=> min, any <=> max, see >>>>>> >>>>>> FAQ . . Creating variables recording whether any or all possess >> some >>>>>> char. >>>>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. >>>> J. >>>>>> Cox >>>>>> 2/03 How do I create a variable recording whether any >>>>>> members of a group (or all members of a group) >>>>>> possess some characteristic? >>>>>> >>>>>> http://www.stata.com/support/faqs/data/anyall.html >>>>>> >>>>>> Nick >>>>>> >>>>>> On Fri, Sep 28, 2012 at 9:45 PM, Gerard Solbrig >>>>>> <gsolbrig@mail.uni-mannheim.de> wrote: >>>>>>> >>>>>>> I'm encountering a problem for which I seek your help. >>>>>>> >>>>>>> Let me start off with an example from my data (what I want it to >>>>>>> look like in the end), before I explain my particular problem. >>>>>>> >>>>>>> firm_id date rep wind_start wind_end >>>>>>> rep_ins >>>>>>> >>>>>>> firm1 01jan2000 0 . . >>>>>>> 0 >>>>>>> firm1 05apr2004 0 . . >>>>>>> 1 >>>>>>> firm1 01nov2004 1 05may2004 >>>> 30may2005 >>>>>>> . >>>>>>> firm1 10dec2004 0 . . >>>>>>> 1 >>>>>>> firm1 01jan2006 0 . . >>>>>>> 0 >>>>>>> firm2 30dec1999 1 03jul1999 >>>> 27jul2000 >>>>>>> . >>>>>>> firm2 05jan2000 1 09jul1999 >>>> 02aug2000 >>>>>>> . >>>>>>> firm2 06jun2000 0 . . >>>>>>> 1 >>>>>>> >>>>>>> Each firm in my data has a 'firm_id'. Variable 'date' refers to >>>>>>> an event date. The 'rep' dummy indicates the type of event. >>>>>>> I set 'wind_start' and 'wind_end' as period around the event >>>>>>> (-180days,+210days), in case it's a rep = 1 type event. >>>>>>> >>>>>>> Now, I would like the 'rep_ins' dummy to indicate (i.e., rep_ins >>>>>>> = 1), whether the date of all other observations of this firm >>>>>>> (where rep = >>>>>>> 0) lies within the range determined by 'wind_start' and 'wind_end' >>>>>>> (which is conditional upon the 'rep' dummy). >>>>>>> >>>>>>> I've come across looping over observations and tried to design a >>>>>>> solution for this problem based on that, but failed to do so. I >>>>>>> assume the solution also depends on sorting the data in a special way. >>>>>>> >>>>>>> Here's the first part of my .do-file: >>>>>>> >>>>>>> gen wind_start = date-180 if rep == 1 gen wind_end = date+210 if >>>>>>> rep == 1 format wind_start %d format wind_end %d gsort +cusip6 >>>>>>> +date >>>>>>> +trandate gen rep_ins = 0 if rep != 1 >>>>>>> >>>>>>> I tried to come up with a solution by adding variables 'per_start' >>>>>>> and 'per_end' for all rep = 0: >>>>>>> >>>>>>> gen per_start = date-180 if rep == 0 gen per_end = date+180 if >>>>>>> rep == 0 format per_start %d format per_end %d >>>>>>> >>>>>>> To mark the period within which the rep = 1 event can lie. Maybe >>>>>>> this could contribute to finding a solution as well. >>>>>> * * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/