Thank you. Yh, the definition for nongroup_f should have been what I
wrote today, and last night in response to Tim's mail.
The final goal is:
(a) contract_id; (b) firm_id (c) nation_id (d) group_d (e) group_f
(f) nongroup_d (g) nongroup_f
1 2 US 1 0 0 0
1 2 US 1 0 0 0
4 3 UK 0 1 0 0
4 3 US 0 1 0 0
8 3 US 0 0 1 1
8 4 UK 0 1 0 1
8 4 US 0 1 1 0
9 3 US 0 0 1 1
9 4 UK 0 0 0 1
9 5 US 0 0 1 1
10 4 CH 0 1 0 1
10 4 UK 0 1 0 1
10 5 US 1 0 0 1
10 5 US 1 0 0 1
10 6 NL 0 0 1 1
10 7 NL 0 0 1 1
And, the correct definitions of the last four 'output' variables are:
(d) group_d = 1 when both firm_id and nation_id are same for the
given observation relative to at least one other observation with
the same contract_id
(e) group_f = 1 when firm_id is same but nation_id is different for the
given observation relative to at least one other observation with
the same contract_id
(f) nongroup_d = 1 when firm_id is different but nation_id is same for the
given observation relative to at least one other observation with
the same contract_id
(g) nongroup_f = 1 when both firm_id and nation_id are different for the
given observation relative to at least one other observation with
the same contract_id
The first three variables could be derived following your logic, and
for the last I'd see how to apply your suggestions (I'd also re-read
Nick's paper).
On Wed, Nov 11, 2009 at 12:40 PM, Martin Weiss <[email protected]> wrote:
>
> <>
>
>
> Wait a minute! Seems to me you also changed the definition itself, which
> triggers a different outcome for this last dummy? Anyway, provide your new
> final goal, as you did yesterday, together with the correct definitions.
>
> I think you can safely omit the -forvalues- loops. Nick was not fond of them
> yesterday, and neat solutions to such problems usually are derived from a
> judicious combination of -bysort- and some -egen- function(s). This is
> material covered comprehensively in Nick`s seminal
> http://www.stata-journal.com/sjpdf.html?articlenum=pr0004. Other commands
> recently employed for insidious problems of this kind are -expand-,
> -tempfile- and -merge-...
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
> [mailto:[email protected]] Im Auftrag von joe j
> Gesendet: Mittwoch, 11. November 2009 12:06
> An: [email protected]
> Betreff: Re: st: AW: forvalues & replace not working under two 'not equal
> to' conditions
>
> Just an update. I discovered that given the definition of nongroup_f
> "as equals 1 when both firm_id and nation_id are different for the
> given observation relative to at least one other observation within
> the same contract_id", the following should be the correct output for
> contract_id 8 (the columns being contract_id, firm_id, country_id and
> nongroup_f):
>
> 8 3 US 1
> 8 4 UK 1
> 8 4 US 0
> Note that for firm_id 4 for for the US, the value of nongroup_f should
> be 0. (Indeed I had made a mistake in the output I posted yesterday).
> While I will use Martin's excellent code for the other three columns
> (group_d, etc), for the nongroup_f column alone, following Nick's
> pointers, I found that adding to the IF clause "nation_id[_n+`i']!=."
> in my clunky code would yield the correct result.
>
> forvalues i=1/`=_N'{
> bys id_a: replace nongroup_f=1 if (firm_id~=firm_id[_n-`i']) &
> (nation_id~=nation_id[_n-`i']) & (nation_id[_n-`i']!=.)
> }
> forvalues i=1/`=_N'{
> bys id_a: replace nongroup_f=1 if (firm_id~=firm_id[_n+`i']) &
> (nation_id~=nation_id[_n+`i']) & (nation_id[_n+`i']!=.)
> }
> (I know it doesn't make sense to use _N as the upper limit; I'd
> perhaps use the number of records in the contract_id with the maximum
> number of records. I'd also see if Martin's code could be used here as
> well with modifications)
>
> Thanks again for all the help.
>
> On Wed, Nov 11, 2009 at 12:18 AM, joe j <[email protected]> wrote:
>> Sorry, I should have explained it better. nongroup_f = 1 when both
>> firm_id and nation_id are different for the given observation relative
>> to "at least one other observation" within the same contract_id. Thus
>> in the following case of contract_id=10, we have value 1 for all
>> observations for the nongroup_f variable. Martin's last response gives
>> the correct result. Thanks, joe.
>>
>> 10 4 CH 0 1 0 1
>> 10 4 UK 0 1 0 1
>> 10 5 US 1 0 0 1
>> 10 5 US 1 0 0 1
>> 10 6 NL 0 0 1 1
>> 10 7 NL 0 0 1 1
>>
>> On Tue, Nov 10, 2009 at 11:54 PM, Tim Wade <[email protected]> wrote:
>>> Maybe I am missing something obvious here, but I can't follow what you
>>> are trying to do either. This criterion:
>>>
>>>> 4 .nongroup_f = 1 when both firm_id and nation_id are different for
>>>> two or more observations with the same contract id
>>>
>>> does not seem to be consistent with this line listing:
>>>
>>>> 10 5 US 1 0 0 1
>>>> 10 5 US 1 0 0 1
>>>
>>> here are two observations with the same firm_id and nation_id yet
>>> nongroup_f is 1. However, you may want to try looking at some
>>> combinations of -duplicates, tag- and levelsof, this might help as an
>>> alternative approach.
>>>
>>> Tim
>>>
>>>
>>> On Tue, Nov 10, 2009 at 12:08 PM, joe j <[email protected]> wrote:
>>>> Thanks. The last 4 columns (group_d; group_f; nongroup_d; nongroup_f)
>>>> are the final output variables. Their definitions are below the table.
>>>>
>>>> ******
>>>> contract_id; firm_id; nation_id; group_d; group_f; nongroup_d;
> nongroup_f
>>>> 1 2 US 1 0 0 0
>>>> 1 2 US 1 0 0 0
>>>> 4 3 UK 0 1 0 0
>>>> 4 3 US 0 1 0 0
>>>> 8 3 US 0 0 1 1
>>>> 8 4 UK 0 1 0 1
>>>> 8 4 US 0 1 1 1
>>>> 9 3 US 0 0 1 1
>>>> 9 4 UK 0 0 0 1
>>>> 9 5 US 0 0 1 1
>>>> 10 4 CH 0 1 0 1
>>>> 10 4 UK 0 1 0 1
>>>> 10 5 US 1 0 0 1
>>>> 10 5 US 1 0 0 1
>>>> 10 6 NL 0 0 1 1
>>>> 10 7 NL 0 0 1 1
>>>> ******
>>>> 1. group_d = 1 when both firm_id and nation_id are same for two or
>>>> more observations with the same contract id
>>>> 2. group_f = 1 when firm_id is same but nation_id is different for
>>>> two or more observations with the same contract id
>>>> 3. nongroup_d = 1 when firm_id is different but nation_id is same for
>>>> two or more observations with the same contract id
>>>> 4 .nongroup_f = 1 when both firm_id and nation_id are different for
>>>> two or more observations with the same contract id
>>>>
>>>>
>>>> On Tue, Nov 10, 2009 at 5:47 PM, Martin Weiss <[email protected]>
> wrote:
>>>>>
>>>>> <>
>>>>>
>>>>>
>>>>> For clarification, you could provide the solution, i.e. the dummies
> that you
>>>>> actually want to see as your final output, for your chosen example.
> Makes it
>>>>> considerably easier to work towards code for you...
>>>>>
>>>>>
>>>>>
>>>>> HTH
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Ursprüngliche Nachricht-----
>>>>> Von: [email protected]
>>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>>> Gesendet: Dienstag, 10. November 2009 17:39
>>>>> An: [email protected]
>>>>> Betreff: Re: st: AW: forvalues & replace not working under two 'not
> equal
>>>>> to' conditions
>>>>>
>>>>> Thanks Martin. I think I wasn't clear enough in the last mail. I was
>>>>> not looking at various combinations of firm_id, nation_id and
>>>>> contract_id 'for each observation'. Rather I was looking at the
>>>>> similarity or difference of firm_id/nation_id 'between two or more
>>>>> observations' under each contract_id.
>>>>>
>>>>> Based on Martin's suggestion I could derive group_d (see below). But I
>>>>> still can't get right nongroup_f, which equals 1 (for all
>>>>> observations) if firm_id and nation_id are different for two or more
>>>>> observations under each contract_id (but it takes a value 1, wrongly,
>>>>> for all observations in the data)
>>>>>
>>>>> *deriving group_d (this works)
>>>>> egen groups=group(firm_id nation_id)
>>>>>
>>>>> bys contract_id (groups): /*
>>>>> */ gen byte distinctcount_group_d= /*
>>>>> */ (groups[_n]==groups[_n+1])
>>>>>
>>>>> bys contract_id (groups): /*
>>>>> */ replace distinctcount_group_d=1 /*
>>>>> */ if (groups[_n]==groups[_n-1])
>>>>>
>>>>> *2 deriving nongroup_f doesnt work (e.g. it should be 0 for
> contract_id=1)
>>>>> bys contract_id (groups): /*
>>>>> */ gen byte distinctcount_nongroup_f= /*
>>>>> */ (groups[_n]~=groups[_n+1]) & (nation_id[_n]~=nation_id[_n+1])
>>>>>
>>>>> bys contract_id (groups): /*
>>>>> */ replace distinctcount_nongroup_f=1 /*
>>>>> */ if (groups[_n]~=groups[_n-1]) & (nation_id[_n]~=nation_id[_n-1])
>>>>>
>>>>> On Tue, Nov 10, 2009 at 4:14 PM, Martin Weiss <[email protected]>
> wrote:
>>>>>>
>>>>>> <>
>>>>>>
>>>>>> I think a variable denoting the combinations between the three ids is
> a
>>>>> good
>>>>>> place to start for you:
>>>>>>
>>>>>>
>>>>>>
>>>>>> *************
>>>>>> clear*
>>>>>> inp byte(contract_id firm_id) nation_id:mylabel, auto
>>>>>> 1 2 "US"
>>>>>> 1 2 "US"
>>>>>> 4 3 "UK"
>>>>>> 4 3 "US"
>>>>>> 8 4 "US"
>>>>>> 8 4 "UK"
>>>>>> 8 3 "US"
>>>>>> 9 5 "US"
>>>>>> 9 4 "UK"
>>>>>> 9 3 "US"
>>>>>> 10 5 "US"
>>>>>> 10 5 "US"
>>>>>> 10 6 "NL"
>>>>>> 10 7 "NL"
>>>>>> 10 4 "UK"
>>>>>> 10 4 "CH"
>>>>>> end
>>>>>>
>>>>>> egen groups=group(contract_id firm_id nation_id)
>>>>>>
>>>>>> l, sepby(con) noobs
>>>>>> *************
>>>>>>
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Ursprüngliche Nachricht-----
>>>>>> Von: [email protected]
>>>>>> [mailto:[email protected]] Im Auftrag von joe j
>>>>>> Gesendet: Dienstag, 10. November 2009 16:04
>>>>>> An: [email protected]
>>>>>> Betreff: st: forvalues & replace not working under two 'not equal to'
>>>>>> conditions
>>>>>>
>>>>>> My dataset has three variables 1. contract_id, 2. firm_id and 3.
>>>>>> nation_id. I want to create 4 variables, each of which gets a value of
>>>>>> 1 if certain conditions are met. The variables I want to create are
>>>>>> specific to the contract id, and are:
>>>>>>
>>>>>> 1. group_d = 1 when both firm_id and nation_id are same for two or
>>>>>> more firms with the same contract id
>>>>>> 2. group_f = 1 when firm_id is same but nation_id is different for
>>>>>> two or more firms with the same contract id
>>>>>> 3. nongroup_d = 1 when firm_id is different but nation_id is same for
>>>>>> two or more firms with the same contract id
>>>>>> 4 .nongroup_f = 1 when both firm_id and nation_id are different for
>>>>>> two or more firms with the same contract id
>>>>>>
>>>>>> The following code works well for the first three variables, but not
>>>>>> for the last, nongroup_f; the value is 1 for all observations. I can't
>>>>>> figure out why.
>>>>>>
>>>>>> This is a sample code:
>>>>>>
>>>>>> clear
>>>>>> inp str10(contract_id firm_id nation_id)
>>>>>> 1 2 "US"
>>>>>> 1 2 "US"
>>>>>> 4 3 "UK"
>>>>>> 4 3 "US"
>>>>>> 8 4 "US"
>>>>>> 8 4 "UK"
>>>>>> 8 3 "US"
>>>>>> 9 5 "US"
>>>>>> 9 4 "UK"
>>>>>> 9 3 "US"
>>>>>> 10 5 "US"
>>>>>> 10 5 "US"
>>>>>> 10 6 "NL"
>>>>>> 10 7 "NL"
>>>>>> 10 4 "UK"
>>>>>> 10 4 "CH"
>>>>>> end
>>>>>>
>>>>>>
>>>>>> *1.group_d . WORKS!
>>>>>> gen group_d=.
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace group_d=1 if firm_id==firm_id[_n-`i'] &
>>>>>> nation_id==nation_id[_n-`i']
>>>>>> }
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace group_d=1 if firm_id==firm_id[_n+`i'] &
>>>>>> nation_id==nation_id[_n+`i']
>>>>>> }
>>>>>>
>>>>>> *2.group_f WORKS!
>>>>>> gen group_f=.
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace group_f=1 if firm_id==firm_id[_n-`i'] &
>>>>>> nation_id!=nation_id[_n-`i']
>>>>>> }
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace group_f=1 if firm_id==firm_id[_n+`i'] &
>>>>>> nation_id!=nation_id[_n+`i']
>>>>>> }
>>>>>>
>>>>>> *3. nongroup_d WORKS!
>>>>>> gen nongroup_d=.
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace nongroup_d=1 if firm_id!=firm_id[_n-`i'] &
>>>>>> nation_id==nation_id[_n-`i']
>>>>>> }
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace nongroup_d=1 if firm_id!=firm_id[_n+`i'] &
>>>>>> nation_id==nation_id[_n+`i']
>>>>>> }
>>>>>>
>>>>>> *4.nongroup_f DOESN'T WORK!!
>>>>>> gen nongroup_f=.
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace nongroup_f=1 if (firm_id~=firm_id[_n-`i']) &
>>>>>> (nation_id~=nation_id[_n-`i'])
>>>>>> }
>>>>>> forvalues i=1/`=_N'{
>>>>>> bys contract_id: replace nongroup_f=1 if (firm_id~=firm_id[_n+`i']) &
>>>>>> (nation_id~=nation_id[_n+`i'])
>>>>>> }
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/statalist/faq
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/statalist/faq
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/statalist/faq
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/statalist/faq
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/statalist/faq
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/