Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Looping across observations (forwards and backwards)
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Looping across observations (forwards and backwards)
Date
Tue, 8 Nov 2011 22:01:32 +0000
Sorry, but I am going to back off from this. I've tried and failed to
understand this twice, and I don't have the inclination to try again.
Also, it does not seem that you have tried all my suggestions,
although they were only guesses, so I don't feel obliged to try again.
The real question for you is whether there is a completely different
way for you to explain all this. These rules come from somewhere and
it's possibly a context someone will recognise if you explain it
afresh. But pushing harder at the same shut door is unlikely to get
results. It would be nice if I were wrong about that.
Nick
On Tue, Nov 8, 2011 at 9:47 PM, Pedro Nakashima
<[email protected]> wrote:
> Thanks Nick, but it didn't work.
>
> Below I put a larger sample , a code that worked (for this small small
> sample) and, at the end, a description of what I want to do.
>
> clear
> input v269 v270 v271 ordem novaordem sinal
> 1 1986 10 96 -96 .
> 1 1988 50 148 -148 .
> 1 1986 100 187 -187 .
> 1 1986 100 513 -513 .
> 1 1985 20 743 -743 .
> 1 1985 40 944 -944 .
> 1 1985 40 945 -945 .
> 1 1988 100 954 -954 .
> 2 1985 40 966 -966 1
> 1 1986 40 971 -971 .
> 1 1986 40 992 -992 .
> 2 1985 20 1001 -1001 1
> 0 1985 20 1019 -1019 .
> 2 1985 20 1026 -1026 -1
> 0 1985 40 1032 -1032 .
> 1 1986 100 1034 -1034 .
> 0 1985 40 1035 -1035 .
> 0 1985 40 1045 -1045 .
> 2 1986 10 1053 -1053 1
> 0 1986 40 1054 -1054 .
> 2 1986 100 1056 -1056 1
> 2 1986 40 1062 -1062 -1
> 2 1985 20 1064 -1064 -1
> 2 1985 40 1065 -1065 -1
> 1 1986 45 1068 -1068 .
> 2 1986 45 1070 -1070 1
> 2 1986 100 1074 -1074 1
> 2 1988 10 1079 -1079 0
> 2 1988 100 1081 -1081 1
> 2 1988 50 1088 -1088 1
> 0 1988 50 1091 -1091 .
> 0 1988 50 1093 -1093 .
> 2 1988 70 1094 -1094 0
> 0 1988 50 1098 -1098 .
> 2 1988 50 1099 -1099 -1
> 0 1988 10 1102 -1102 .
> 2 1988 10 1103 -1103 -1
> 0 1988 50 1104 -1104 .
> 2 1988 10 1105 -1105 -1
> 2 1988 10 1107 -1107 -1
> 2 1988 10 1110 -1110 -1
> 0 1988 50 1113 -1113 .
> 2 1988 50 1115 -1115 -1
> 2 1988 10 1116 -1116 -1
> 2 1988 10 1118 -1118 -1
> 0 1988 10 1119 -1119 .
> 2 1988 10 1120 -1120 -1
> 0 1986 40 1124 -1124 .
> 2 1986 10 1127 -1127 1
> 2 1986 10 1131 -1131 1
> 2 1986 10 1135 -1135 1
> end
> sort time
> capture drop orde* sina*
> gen ordem = _n
> gen ordemnova = -_n
> sort ordemnova
> gen sinal2=.
>
> forvalues i=1/`=_N' {
> if v269[`i']==2 {
> local pr = v270[`i']
> local qt = v271[`i']
> local j=`i'+1
> while ((v269[`j']==2) | (v270[`j']!=`pr' | v271[`j']!=`qt')) & (`j'<=`=_N') {
> local ++j
> }
> if v269[`j']==0 {
> local ordem = -1
> }
> else if v269[`j']==1 {
> local ordem = 1
> }
> else {
> local ordem = 0
> }
> quietly replace sinal2 = `ordem' in `i'
> }
> }
> sort ordem
>
> Description:
> 1) The variable "sinal2" replicates de desired "sinal"
> 2) The first entry of v269 in which v269==2 has the pair v270=185 e v271=40.
> I want to put one of the 3 numbers (-1, 1 or 1) in the variable "sinal".
> What decides which one is the entry in v269 in other observation: the
> one that has the same values (v270==185 and v271==40).
> 3) To do that, I search backwards(in observations) for the pair
> v270==185 and v271==40, skiping observations that, even though they
> have the same pair v270, v271, have also v269==2. To conclude, I want
> to see the first observation that I find when looking backwards,
> starting from a observation in which v269==2, that have either v269==0
> or v269==1
> 4) For the first case in which v269==2 occurs, the looping go
> backwards 2 observations (2 observations before we have v269==1,
> v270==185 and v271==40). Seeing this v269==1, I store the value +1 in
> the local macro "ordem" and then put it in variable sinal.
> For the second case in which v269==2 occurs, the looping go
> backwards 7 observations .
> For the third case, the looping go backwards 2 observations.
> And so on..
>
> The problem is that when running this code in a dta-file that has
> 920,000 lines, time goes by and it seems the task will never end. And
> I think it's not normal.
>
> I wonder if a code without loopings, as you did first, would be able
> to do what I described, given that It's perfect possible 1) that we
> can have consecutive observations v269==2 and, 2) the number of times
> the macro j is increased can overlap among v269==2 observations.
>
> I would thank if one could think with me of this problem. Also it
> might be usefull for other people..
>
> Best,
> Pedro.
>
> 2011/10/4 Nick Cox <[email protected]>:
>> I have looked at this again. I am still not sure what you are trying
>> to do here, but this reproduces your first example:
>>
>> clear all
>> input v_269 v_270 v_271 desired_sinalt
>> 0 1.4 100 .
>> 1 1.5 100 .
>> 0 1.5 95 .
>> 0 1.4 100 .
>> 2 1.5 100 1
>> 1 1.7 98 .
>> 0 1.2 99 .
>> 2 1.5 95 -1
>> 0 1.8 101 .
>> end
>> gen long order = _n
>> gen start = v_269 == 2
>> gen block = sum(start)
>> bysort block (order) : ///
>> gen match = sum(v_270 == v_270[1] | v_271 == v_271[1])
>> by block : ///
>> replace match = sum(cond(inlist(v_269, 1, 0), v_269 * (match == 1),.))
>> by block : replace match = match[_N]
>> by block : gen sinalt = cond(match == 1, 1, cond(match == 0, -1, .)) if block
>>
>>
>>
>>
>> On Tue, Oct 4, 2011 at 3:32 PM, Nick Cox <[email protected]> wrote:
>>> I don't fully understand what you are trying to do here, but
>>>
>>> local ++j
>>>
>>> need not stop before
>>>
>>> v_270[`j']==v_270[`i'] | v_271[`j']==v_271[`i']
>>>
>>> and perhaps that is not guaranteed for all values of 2.
>>>
>>> so perhaps you need another condition to stop it, say that the next value of v_269 is 2.
>>>
>>> I think you need another approach. Evidently blocks start with some key values and then you count something within blocks. A few fragmentary suggestions
>>>
>>> gen start = v269 == 2
>>> gen block = sum(start)
>>> egen start_v269 = total(start * v269), by(block)
>>> egen start_v270 = total(start * v270), by(block)
>>> egen start_v271 = total(start * v271), by(block)
>>>
>>>
>>>
>>> Nick
>>> [email protected]
>>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:[email protected]] On Behalf Of Pedro Nakashima
>>> Sent: 03 October 2011 20:39
>>> To: [email protected]
>>> Subject: Re: st: Looping across observations (forwards and backwards)
>>>
>>> Thanks, Nick
>>>
>>> When I applied you tip to the code:
>>>
>>> clear all
>>> input v_269 v_270 v_271 desired_sinalt
>>> 0 1.4 100 .
>>> 1 1.5 100 .
>>> 0 1.5 95 .
>>> 0 1.4 100 .
>>> 2 1.5 100 1
>>> 1 1.7 98 .
>>> 0 1.2 99 .
>>> 2 1.5 95 -1
>>> 0 1.8 101 .
>>> end
>>> gen order = _n
>>> gen neworder=-_n
>>> sort neworder
>>> gen sinalt=.
>>> set trace on
>>> forvalues i=1/`=_N' {
>>> if v_269[`i']==2{
>>> local j=`i'+1
>>> while (v_270[`j']!=v_270[`i'] | v_271[`j']!=v_271[`i']) {
>>> local ++j
>>> }
>>> if v_270[`j']==v_270[`i'] | v_271[`j']==v_271[`i'] {
>>> if v_269[`j']==1{
>>> local sinal=1
>>> }
>>> else if v_269[`j']==0 {
>>> local sinal=-1
>>> }
>>> else {
>>> local sinal=.
>>> }
>>> }
>>> replace sinalt=`sinal' in `i'
>>> }
>>> }
>>> set trace off
>>> sort order
>>>
>>> ,, it worked,
>>>
>>> But if I replace the third observation as follows:
>>> replace v_269 = 2 in 3
>>> replace v_271 = 100 in 3
>>>
>>> The looping never ends..
>>>
>>> Also, It's important to say that if the criterion matches v_269 and
>>> v_271 in observation number 3 (where v_269==2), as in the above
>>> example, I want to ignore it.
>>>
>>> Thanks in advance for the help.
>>>
>>> Best regards
>>> Pedro Nakashima.
>>>
>>> 2011/9/24 Nick Cox <[email protected]>:
>>>> A different comment is that it is much easier to go forwards in Stata
>>>> than backwards. So, reversing the whole dataset, and defining spells
>>>> "started" in a certain way might be easier. When all is done you
>>>> reverse it again.
>>>>
>>>> Reversing is easy
>>>>
>>>> gen neworder = -_n
>>>> sort neworder
>>>>
>>>> On Sat, Sep 24, 2011 at 4:07 PM, Nick Cox <[email protected]> wrote:
>>>>> When your program gets to
>>>>>
>>>>> replace sinalt=`sinal' in `i'
>>>>>
>>>>> evidently `sinal' is undefined so Stata sees
>>>>>
>>>>> replace sinalt= in `i'
>>>>>
>>>>> It tries first to interpret -in- as the name of a variable or scalar,
>>>>> fails, and aborts with error.
>>>>>
>>>>> Perhaps when you coded
>>>>>
>>>>> if cod[j]==1 {
>>>>>
>>>>> you meant
>>>>>
>>>>> if cod[`j']==1 {
>>>>>
>>>>> On Sat, Sep 24, 2011 at 3:28 PM, pedromfn <[email protected]> wrote:
>>>>>
>>>>>> My database looks like:
>>>>>>
>>>>>> obs cod pr qt sinalt
>>>>>> 1 1 1.4 100 .
>>>>>> 2 2 1.5 100 .
>>>>>> 3 1 1.5 95 .
>>>>>> 4 1 1.4 100 .
>>>>>> 5 3 1.5 100 .
>>>>>>
>>>>>> and I want to replace observations of sinalt in which cod==3, according to
>>>>>> the following rule:
>>>>>> 1) Go across observations looking for observations in which cod=3
>>>>>> 2) In the above example, the first observation is observation 5, in which
>>>>>> pr[5]=1.5 and qt[5]=100. Once that observation was found, go backwards
>>>>>> through observations looking for the first observation j in which
>>>>>> pr[j]==pr[5] & qt[j]==qt[5]. In the example, j=2.
>>>>>> 3) Replace sinalt[5]=`sinal' , where the macro sinal is defined as:
>>>>>> if cod[j]==1, store in the local sinal the value 1
>>>>>> if cod[j]==2, store in the local sinal the value -1
>>>>>> 4) Once last replace was done, look for the next observation in which cod==3
>>>>>> and do the same thing.
>>>>>>
>>>>>> I wrote the following do-file, but it didn't work:
>>>>>>
>>>>>> forvalues i=1/`=_N' {
>>>>>> if cod[`i']==3{
>>>>>> local j=`i'-1
>>>>>> if pr[`j']==pr[`i'] & qt[`j']==qt[`i'] {
>>>>>> if cod[j]==1 {
>>>>>> local sinal 1
>>>>>> }
>>>>>> else if cod[`j']==2 {
>>>>>> local sinal -1
>>>>>> }
>>>>>> else {
>>>>>> local sinal
>>>>>> }
>>>>>> }
>>>>>> else {
>>>>>> while pr[`j']!=pr[`i'] | qt[`j']!=qt[`i'] {
>>>>>> local --j
>>>>>> }
>>>>>> }
>>>>>> replace sinalt=`sinal' in `i'
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> ERROR:
>>>>>> in not found
>>>>>> r(111);
>>>>>
>>>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/