Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify observations that appear in a list
From
R Zhang <[email protected]>
To
[email protected]
Subject
Re: st: Identify observations that appear in a list
Date
Fri, 14 Mar 2014 16:34:01 -0400
Hi Nick,
your last post works perfectly. I try to incorporate what I learned
from you in an earlier post
http://www.stata.com/statalist/archive/2014-03/msg00232.html
that is, I tried the code below, I want to use howmany variable to
count how many instances
+++++++++++ CODE Begins+++++
clear
input str5 CustomerIndustry str5 SupplierIndustry Input
"1000A" "4000B" 100
"1000A" "3000A" 200
"1000A" "3000B" 100
"1000B" "4000B" 50
"1000B" "2000A" 8
"4000B" "3000A" 19
"4000B" "2000A" 20
end
gen howmany = 0
quie forval i=1/`=_N' {
count if SupplierIndustry =CustomerIndustry [`i]
replace howmany=r(N) in `i'
}
++++++++CODE Ends++++++++
++++++++ERROR (See below) +++++++
quie forval i=1/`=_N' {
type mismatch
what did I do wrong (i.e. causing the error)?
thanks ,
R.
On Fri, Mar 14, 2014 at 12:57 AM, R Zhang <[email protected]> wrote:
> Thank you for being so helpful !!!
>
> Warm regards,
>
> Rochelle
>
> On Thu, Mar 13, 2014 at 7:50 AM, Nick Cox <[email protected]> wrote:
>> This is an FAQ, at least in the sense that this is frequently asked here.
>>
>> One approach is just to -merge- the data with a reduced copy of
>> itself, with the important twist that you -rename- what you want as an
>> identifier.
>>
>> The slogan I use to remind myself of this trick is
>>
>> "-merge- is for finding intersections as well as unions"
>>
>> and you're welcome to pin or write it on a board near you.
>>
>> http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/
>> is also relevant.
>>
>> . clear
>>
>> . input str5 CustomerIndustry str5 SupplierIndustry Input
>>
>> Custome~y Supplie~y Input
>> 1. 1000A 4000B 100
>> 2. 1000A 3000A 200
>> 3. 1000A 3000B 100
>> 4. 1000B 4000B 50
>> 5. 1000B 2000A 8
>> 6. 4000B 3000A 19
>> 7. 4000B 2000A 20
>> 8. 3000A 3000B 18
>> 9. 3000A 3000D 12
>> 10. 2000A 1000D 25
>> 11. end
>>
>> . save tostart
>> file tostart.dta saved
>>
>> . bysort SupplierIndustry: keep if _n == 1
>> (4 observations deleted)
>>
>> . keep SupplierIndustry
>>
>> . rename SupplierIndustry CustomerIndustry
>>
>> . merge 1:m CustomerIndustry using tostart
>>
>> Result # of obs.
>> -----------------------------------------
>> not matched 8
>> from master 3 (_merge==1)
>> from using 5 (_merge==2)
>>
>> matched 5 (_merge==3)
>> -----------------------------------------
>>
>> . tab _merge
>>
>> _merge | Freq. Percent Cum.
>> ------------------------+-----------------------------------
>> master only (1) | 3 23.08 23.08
>> using only (2) | 5 38.46 61.54
>> matched (3) | 5 38.46 100.00
>> ------------------------+-----------------------------------
>> Total | 13 100.00
>>
>> .
>> end of do-file
>>
>> . list if _merge==3
>>
>> +-------------------------------------------+
>> | Custom~y Suppli~y Input _merge |
>> |-------------------------------------------|
>> 2. | 2000A 1000D 25 matched (3) |
>> 3. | 3000A 3000B 18 matched (3) |
>> 6. | 4000B 3000A 19 matched (3) |
>> 12. | 3000A 3000D 12 matched (3) |
>> 13. | 4000B 2000A 20 matched (3) |
>> +-------------------------------------------+
>>
>> Nick
>> [email protected]
>>
>>
>> On 13 March 2014 02:12, R Zhang <[email protected]> wrote:
>>
>>> I have the following data set (HAVE) (only provide a few observations
>>> as illustration). The input variable gives the dollar input sold by
>>> supplier to customer. You will notice that customer industry 4000B,
>>> 3000A also appear in SupplierIndustry. This indicates that some
>>> industries can be both suppliers and customer.
>>>
>>> +++++++++++++++++++++++
>>>
>>> HAVE
>>>
>>> CustomerIndustry SupplierIndustry Input
>>>
>>> 1000A 4000B 100
>>>
>>> 1000A 3000A 200
>>>
>>> 1000A 3000B 100
>>>
>>> 1000B 4000B 50
>>>
>>> 1000B 2000A 8
>>>
>>> 4000B 3000A 19
>>>
>>> 4000B 2000A 20
>>>
>>> 3000A 3000B 18
>>>
>>> 3000A 3000D 12
>>>
>>> 2000A 1000D 25
>>>
>>> +++++++++++++++++++++++
>>>
>>> I want to create a dataset that list all customer industries that are
>>> also supplier industry, i.e., my output shall appear as :
>>>
>>> CustomerIndustry SupplierIndustry Input
>>>
>>> 4000B 3000A 19
>>>
>>> 4000B 2000A 20
>>>
>>> 3000A 3000B 18
>>>
>>> 3000A 3000D 12
>>>
>>> 2000A 1000D 25
>>>
>>> I am asking for your help on coding this.
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/