Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: AW: forvalues & replace not working under two 'not equal to' conditions


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: AW: forvalues & replace not working under two 'not equal to' conditions
Date   Wed, 11 Nov 2009 17:08:32 -0000

What's missing about it? Instead of counting what is the same, count what is different. 

Nick 
[email protected] 

Martin Weiss

The only missing link is "nongroup_f", though. Any ideas?

Nick Cox

If I understand this correctly, then contrary to any impressions given, I
think that a loop is a good way to proceed here, just not your loops! Here's
an example 

gen group_d = 0 

qui forval i = 1/`=_N' { 
	count if _n != `i' & contract_id == contract_id[`i'] & 
		firm_id == firm_id[`i'] & nation_id == nation_id[`i'] 
	replace group_d = r(N) > 0 in `i' 
} 

This approach is inelegant and tedious, but it's good because you can write
code that is close to the way your are thinking. Also, you can tuck other
things in the same loop. 

For variations on this theme, see 

SJ-7-4  dm0033  . . . . . . Speaking Stata: Counting groups, especially
panels
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J.
Cox
        Q4/07   SJ 7(4):571--581                                 (no
commands)
        discusses how to count panels through reduction commands
        or through tabulation commands and how to overcome
        problems that do not yield easily to these approaches

SJ-7-3  pr0033  . . . . . . . . . . . . . .  Stata tip 51: Events in
intervals
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J.
Cox
        Q3/07   SJ 7(3):440--443                                 (no
commands)
        tip for counting or summarizing irregularly spaced
        events in intervals

SJ-7-1  pr0029  . . . . . . . . . . . . . . .  Speaking Stata: Making it
count
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J.
Cox
        Q1/07   SJ 7(1):117--130                                 (no
commands)
        discusses count used with a loop over observations
        or variables

Nick 
[email protected] 

joe j

Thanks Martin. For now I can manage with what I have.

On Wed, Nov 11, 2009 at 2:29 PM, Martin Weiss <[email protected]> wrote:
>
> <>
>
> I am sure some combination of -duplicates tag- and -egen, group()- can get
> you there, but I am _way_ over my time limit on this one task. So I hope
> someone else can provide you with an answer.
>
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
> [mailto:[email protected]] Im Auftrag von joe j
> Gesendet: Mittwoch, 11. November 2009 13:20
> An: [email protected]
> Betreff: Re: st: AW: forvalues & replace not working under two 'not equal
> to' conditions
>
> Thank you. Yh, the definition for nongroup_f should have been what I
> wrote today, and last night in response to Tim's mail.
> The final goal is:
> (a) contract_id;        (b) firm_id     (c) nation_id (d) group_d (e)
> group_f
> (f) nongroup_d (g) nongroup_f
> 1       2       US      1       0       0       0
> 1       2       US      1       0       0       0
> 4       3       UK      0       1       0       0
> 4       3       US      0       1       0       0
> 8       3       US      0       0       1       1
> 8       4       UK      0       1       0       1
> 8       4       US      0       1       1       0
> 9       3       US      0       0       1       1
> 9       4       UK      0       0       0       1
> 9       5       US      0       0       1       1
> 10      4       CH      0       1       0       1
> 10      4       UK      0       1       0       1
> 10      5       US      1       0       0       1
> 10      5       US      1       0       0       1
> 10      6       NL      0       0       1       1
> 10      7       NL      0       0       1       1
>
> And, the correct definitions of the last four 'output' variables are:
>
> (d)  group_d = 1 when both firm_id and nation_id are same for the
>  given observation relative to at least one other observation with
> the same contract_id
> (e)  group_f = 1  when firm_id is same but nation_id is different for the
>  given observation relative to at least one other observation with
> the same contract_id
> (f)  nongroup_d = 1  when firm_id is different but nation_id is same for
the
>  given observation relative to at least one other observation with
> the same contract_id
> (g) nongroup_f = 1  when both firm_id and nation_id are different for the
>  given observation relative to at least one other observation with
> the same contract_id
>
> The first three variables could be derived following your logic, and
> for the last I'd see how to apply your suggestions (I'd also re-read
> Nick's paper).
>
> On Wed, Nov 11, 2009 at 12:40 PM, Martin Weiss <[email protected]>
wrote:
>>
>> <>
>>
>>
>> Wait a minute! Seems to me you also changed the definition itself, which
>> triggers a different outcome for this last dummy? Anyway, provide your
new
>> final goal, as you did yesterday, together with the correct definitions.
>>
>> I think you can safely omit the -forvalues- loops. Nick was not fond of
> them
>> yesterday, and neat solutions to such problems usually are derived from a
>> judicious combination of -bysort- and some -egen- function(s). This is
>> material covered comprehensively in Nick`s seminal
>> http://www.stata-journal.com/sjpdf.html?articlenum=pr0004. Other commands
>> recently employed for insidious problems of this kind are -expand-,
>> -tempfile- and -merge-...
>>
>>
>> HTH
>> Martin
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: [email protected]
>> [mailto:[email protected]] Im Auftrag von joe j
>> Gesendet: Mittwoch, 11. November 2009 12:06
>> An: [email protected]
>> Betreff: Re: st: AW: forvalues & replace not working under two 'not equal
>> to' conditions
>>
>> Just an update. I discovered that given the definition of nongroup_f
>> "as equals  1  when both firm_id and nation_id are different for the
>> given observation relative to at least one other observation within
>> the same contract_id", the following should be the correct output for
>> contract_id 8 (the columns being contract_id, firm_id, country_id and
>> nongroup_f):
>>
>> 8       3       US      1
>> 8       4       UK      1
>> 8       4       US      0
>> Note that for firm_id 4 for for the US, the value of nongroup_f should
>> be 0. (Indeed I had made a mistake in the output I posted yesterday).
>> While I will use Martin's excellent code for the other three columns
>> (group_d, etc), for the nongroup_f column alone, following Nick's
>> pointers, I found that adding to the IF clause "nation_id[_n+`i']!=."
>> in my clunky code would yield the correct result.
>>
>> forvalues i=1/`=_N'{
>> bys id_a: replace nongroup_f=1 if (firm_id~=firm_id[_n-`i']) &
>> (nation_id~=nation_id[_n-`i']) & (nation_id[_n-`i']!=.)
>> }
>> forvalues i=1/`=_N'{
>> bys id_a: replace nongroup_f=1 if (firm_id~=firm_id[_n+`i']) &
>> (nation_id~=nation_id[_n+`i']) & (nation_id[_n+`i']!=.)
>> }
>> (I know it doesn't make sense to use _N as the upper limit; I'd
>> perhaps use the number of records in the contract_id with the maximum
>> number of records. I'd also see if Martin's code could be used here as
>> well with modifications)
>>
>> Thanks again for all the help.
>>
>> On Wed, Nov 11, 2009 at 12:18 AM, joe j <[email protected]> wrote:
>>> Sorry, I should have explained it better. nongroup_f = 1  when both
>>> firm_id and nation_id are different for the given observation relative
>>> to "at least one other observation" within the same contract_id. Thus
>>> in the following case of contract_id=10, we have value 1 for all
>>> observations for the nongroup_f variable. Martin's last response gives
>>> the correct result. Thanks, joe.
>>>
>>> 10      4       CH      0       1       0       1
>>> 10      4       UK      0       1       0       1
>>> 10      5       US      1       0       0       1
>>> 10      5       US      1       0       0       1
>>> 10      6       NL      0       0       1       1
>>> 10      7       NL      0       0       1       1
>>>
>>> On Tue, Nov 10, 2009 at 11:54 PM, Tim Wade <[email protected]> wrote:
>>>> Maybe I am missing something obvious here, but I can't follow what you
>>>> are trying to do either. This criterion:
>>>>
>>>>> 4 .nongroup_f = 1  when both firm_id and nation_id are different for
>>>>> two or more observations with the same contract id
>>>>
>>>> does not seem to be consistent with this line listing:
>>>>
>>>>> 10      5       US      1       0       0       1
>>>>> 10      5       US      1       0       0       1
>>>>
>>>> here are two observations with the same firm_id and nation_id yet
>>>> nongroup_f is 1. However, you may want to try looking at some
>>>> combinations of -duplicates, tag- and levelsof, this might help as an
>>>> alternative approach.
>>>>
>>>> Tim
>>>>
>>>>
>>>> On Tue, Nov 10, 2009 at 12:08 PM, joe j <[email protected]> wrote:
>>>>> Thanks. The last 4 columns (group_d; group_f; nongroup_d; nongroup_f)
>>>>> are the final output variables. Their definitions are below the table.
>>>>>
>>>>> ******
>>>>> contract_id; firm_id; nation_id; group_d; group_f; nongroup_d;
>> nongroup_f
>>>>> 1       2       US      1       0       0       0
>>>>> 1       2       US      1       0       0       0
>>>>> 4       3       UK      0       1       0       0
>>>>> 4       3       US      0       1       0       0
>>>>> 8       3       US      0       0       1       1
>>>>> 8       4       UK      0       1       0       1
>>>>> 8       4       US      0       1       1       1
>>>>> 9       3       US      0       0       1       1
>>>>> 9       4       UK      0       0       0       1
>>>>> 9       5       US      0       0       1       1
>>>>> 10      4       CH      0       1       0       1
>>>>> 10      4       UK      0       1       0       1
>>>>> 10      5       US      1       0       0       1
>>>>> 10      5       US      1       0       0       1
>>>>> 10      6       NL      0       0       1       1
>>>>> 10      7       NL      0       0       1       1
>>>>> ******
>>>>> 1. group_d = 1 when both firm_id and nation_id are same for two or
>>>>> more observations with the same contract id
>>>>> 2. group_f = 1  when firm_id is same but nation_id is different for
>>>>> two or more observations with the same contract id
>>>>> 3. nongroup_d = 1  when firm_id is different but nation_id is same for
>>>>> two or more observations with the same contract id
>>>>> 4 .nongroup_f = 1  when both firm_id and nation_id are different for
>>>>> two or more observations with the same contract id
>>>>>
>>>>>
>>>>> On Tue, Nov 10, 2009 at 5:47 PM, Martin Weiss <[email protected]>
>> wrote:
>>>>>>
>>>>>> <>
>>>>>>
>>>>>>
>>>>>> For clarification, you could provide the solution, i.e. the dummies
>> that you
>>>>>> actually want to see as your final output, for your chosen example.
>> Makes it
>>>>>> considerably easier to work towards code for you...
>>>>>>
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: [email protected]
>>>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>>>> Gesendet: Dienstag, 10. November 2009 17:39
>>>>>> An: [email protected]
>>>>>> Betreff: Re: st: AW: forvalues & replace not working under two 'not
>> equal
>>>>>> to' conditions
>>>>>>
>>>>>> Thanks Martin. I think I wasn't clear enough in the last mail. I was
>>>>>> not looking at various combinations of firm_id, nation_id and
>>>>>> contract_id 'for each observation'. Rather I was looking at the
>>>>>> similarity or difference of firm_id/nation_id 'between two or more
>>>>>> observations' under each contract_id.
>>>>>>
>>>>>> Based on Martin's suggestion I could derive group_d (see below). But
I
>>>>>> still can't get right nongroup_f, which equals 1 (for all
>>>>>> observations) if firm_id and nation_id are different for two or more
>>>>>> observations under each contract_id (but it takes a value 1, wrongly,
>>>>>> for all observations in the data)
>>>>>>
>>>>>> *deriving group_d (this works)
>>>>>> egen groups=group(firm_id nation_id)
>>>>>>
>>>>>> bys contract_id (groups):  /*
>>>>>> */ gen byte distinctcount_group_d= /*
>>>>>> */ (groups[_n]==groups[_n+1])
>>>>>>
>>>>>> bys contract_id (groups):  /*
>>>>>> */ replace distinctcount_group_d=1 /*
>>>>>> */ if (groups[_n]==groups[_n-1])
>>>>>>
>>>>>> *2 deriving nongroup_f doesnt work (e.g. it should be 0 for
>> contract_id=1)
>>>>>> bys contract_id (groups):  /*
>>>>>> */ gen byte distinctcount_nongroup_f= /*
>>>>>> */ (groups[_n]~=groups[_n+1]) & (nation_id[_n]~=nation_id[_n+1])
>>>>>>
>>>>>> bys contract_id (groups):  /*
>>>>>> */ replace distinctcount_nongroup_f=1 /*
>>>>>> */ if (groups[_n]~=groups[_n-1]) & (nation_id[_n]~=nation_id[_n-1])
>>>>>>
>>>>>> On Tue, Nov 10, 2009 at 4:14 PM, Martin Weiss <[email protected]>
>> wrote:
>>>>>>>
>>>>>>> <>
>>>>>>>
>>>>>>> I think a variable denoting the combinations between the three ids
is
>> a
>>>>>> good
>>>>>>> place to start for you:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *************
>>>>>>> clear*
>>>>>>> inp byte(contract_id firm_id) nation_id:mylabel, auto
>>>>>>> 1   2   "US"
>>>>>>> 1   2   "US"
>>>>>>> 4   3   "UK"
>>>>>>> 4   3   "US"
>>>>>>> 8   4   "US"
>>>>>>> 8   4   "UK"
>>>>>>> 8   3   "US"
>>>>>>> 9   5   "US"
>>>>>>> 9   4   "UK"
>>>>>>> 9   3   "US"
>>>>>>> 10   5   "US"
>>>>>>> 10   5   "US"
>>>>>>> 10   6   "NL"
>>>>>>> 10   7   "NL"
>>>>>>> 10   4   "UK"
>>>>>>> 10   4   "CH"
>>>>>>> end
>>>>>>>
>>>>>>> egen groups=group(contract_id firm_id nation_id)
>>>>>>>
>>>>>>> l, sepby(con) noobs
>>>>>>> *************
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> HTH
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> -----Ursprüngliche Nachricht-----
>>>>>>> Von: [email protected]
>>>>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>>>>> Gesendet: Dienstag, 10. November 2009 16:04
>>>>>>> An: [email protected]
>>>>>>> Betreff: st: forvalues & replace not working under two 'not equal
to'
>>>>>>> conditions
>>>>>>>
>>>>>>> My dataset has three variables 1. contract_id, 2. firm_id and 3.
>>>>>>> nation_id. I want to create 4 variables, each of which gets a value
> of
>>>>>>> 1 if certain conditions are met. The variables I want to create are
>>>>>>> specific to the contract id, and are:
>>>>>>>
>>>>>>> 1. group_d = 1 when both firm_id and nation_id are same for two or
>>>>>>> more firms with the same contract id
>>>>>>> 2. group_f = 1  when firm_id is same but nation_id is different for
>>>>>>> two or more firms with the same contract id
>>>>>>> 3. nongroup_d = 1  when firm_id is different but nation_id is same
> for
>>>>>>> two or more firms with the same contract id
>>>>>>> 4 .nongroup_f = 1  when both firm_id and nation_id are different for
>>>>>>> two or more firms with the same contract id
>>>>>>>
>>>>>>> The following code works well for the first three variables, but not
>>>>>>> for the last, nongroup_f; the value is 1 for all observations. I
> can't
>>>>>>> figure out why.
>>>>>>>
>>>>>>> This is a sample code:
>>>>>>>
>>>>>>> clear
>>>>>>> inp str10(contract_id firm_id   nation_id)
>>>>>>> 1   2   "US"
>>>>>>> 1   2   "US"
>>>>>>> 4   3   "UK"
>>>>>>> 4   3   "US"
>>>>>>> 8   4   "US"
>>>>>>> 8   4   "UK"
>>>>>>> 8   3   "US"
>>>>>>> 9   5   "US"
>>>>>>> 9   4   "UK"
>>>>>>> 9   3   "US"
>>>>>>> 10   5   "US"
>>>>>>> 10   5   "US"
>>>>>>> 10   6   "NL"
>>>>>>> 10   7   "NL"
>>>>>>> 10   4   "UK"
>>>>>>> 10   4   "CH"
>>>>>>> end
>>>>>>>
>>>>>>>
>>>>>>> *1.group_d . WORKS!
>>>>>>> gen group_d=.
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace group_d=1 if firm_id==firm_id[_n-`i'] &
>>>>>>> nation_id==nation_id[_n-`i']
>>>>>>> }
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace group_d=1 if firm_id==firm_id[_n+`i'] &
>>>>>>> nation_id==nation_id[_n+`i']
>>>>>>> }
>>>>>>>
>>>>>>> *2.group_f  WORKS!
>>>>>>> gen group_f=.
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace group_f=1 if firm_id==firm_id[_n-`i'] &
>>>>>>> nation_id!=nation_id[_n-`i']
>>>>>>> }
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace group_f=1 if firm_id==firm_id[_n+`i'] &
>>>>>>> nation_id!=nation_id[_n+`i']
>>>>>>> }
>>>>>>>
>>>>>>> *3. nongroup_d  WORKS!
>>>>>>> gen nongroup_d=.
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace nongroup_d=1 if firm_id!=firm_id[_n-`i'] &
>>>>>>> nation_id==nation_id[_n-`i']
>>>>>>> }
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace nongroup_d=1 if firm_id!=firm_id[_n+`i'] &
>>>>>>> nation_id==nation_id[_n+`i']
>>>>>>> }
>>>>>>>
>>>>>>> *4.nongroup_f DOESN'T WORK!!
>>>>>>> gen nongroup_f=.
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace nongroup_f=1 if (firm_id~=firm_id[_n-`i'])
&
>>>>>>> (nation_id~=nation_id[_n-`i'])
>>>>>>> }
>>>>>>> forvalues i=1/`=_N'{
>>>>>>> bys contract_id: replace nongroup_f=1 if (firm_id~=firm_id[_n+`i'])
&
>>>>>>> (nation_id~=nation_id[_n+`i'])
>>>>>>> }


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index