Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify observations that appear in a list
From
R Zhang <[email protected]>
To
[email protected]
Subject
Re: st: Identify observations that appear in a list
Date
Fri, 14 Mar 2014 00:57:56 -0400
Thank you for being so helpful !!!
Warm regards,
Rochelle
On Thu, Mar 13, 2014 at 7:50 AM, Nick Cox <[email protected]> wrote:
> This is an FAQ, at least in the sense that this is frequently asked here.
>
> One approach is just to -merge- the data with a reduced copy of
> itself, with the important twist that you -rename- what you want as an
> identifier.
>
> The slogan I use to remind myself of this trick is
>
> "-merge- is for finding intersections as well as unions"
>
> and you're welcome to pin or write it on a board near you.
>
> http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/
> is also relevant.
>
> . clear
>
> . input str5 CustomerIndustry str5 SupplierIndustry Input
>
> Custome~y Supplie~y Input
> 1. 1000A 4000B 100
> 2. 1000A 3000A 200
> 3. 1000A 3000B 100
> 4. 1000B 4000B 50
> 5. 1000B 2000A 8
> 6. 4000B 3000A 19
> 7. 4000B 2000A 20
> 8. 3000A 3000B 18
> 9. 3000A 3000D 12
> 10. 2000A 1000D 25
> 11. end
>
> . save tostart
> file tostart.dta saved
>
> . bysort SupplierIndustry: keep if _n == 1
> (4 observations deleted)
>
> . keep SupplierIndustry
>
> . rename SupplierIndustry CustomerIndustry
>
> . merge 1:m CustomerIndustry using tostart
>
> Result # of obs.
> -----------------------------------------
> not matched 8
> from master 3 (_merge==1)
> from using 5 (_merge==2)
>
> matched 5 (_merge==3)
> -----------------------------------------
>
> . tab _merge
>
> _merge | Freq. Percent Cum.
> ------------------------+-----------------------------------
> master only (1) | 3 23.08 23.08
> using only (2) | 5 38.46 61.54
> matched (3) | 5 38.46 100.00
> ------------------------+-----------------------------------
> Total | 13 100.00
>
> .
> end of do-file
>
> . list if _merge==3
>
> +-------------------------------------------+
> | Custom~y Suppli~y Input _merge |
> |-------------------------------------------|
> 2. | 2000A 1000D 25 matched (3) |
> 3. | 3000A 3000B 18 matched (3) |
> 6. | 4000B 3000A 19 matched (3) |
> 12. | 3000A 3000D 12 matched (3) |
> 13. | 4000B 2000A 20 matched (3) |
> +-------------------------------------------+
>
> Nick
> [email protected]
>
>
> On 13 March 2014 02:12, R Zhang <[email protected]> wrote:
>
>> I have the following data set (HAVE) (only provide a few observations
>> as illustration). The input variable gives the dollar input sold by
>> supplier to customer. You will notice that customer industry 4000B,
>> 3000A also appear in SupplierIndustry. This indicates that some
>> industries can be both suppliers and customer.
>>
>> +++++++++++++++++++++++
>>
>> HAVE
>>
>> CustomerIndustry SupplierIndustry Input
>>
>> 1000A 4000B 100
>>
>> 1000A 3000A 200
>>
>> 1000A 3000B 100
>>
>> 1000B 4000B 50
>>
>> 1000B 2000A 8
>>
>> 4000B 3000A 19
>>
>> 4000B 2000A 20
>>
>> 3000A 3000B 18
>>
>> 3000A 3000D 12
>>
>> 2000A 1000D 25
>>
>> +++++++++++++++++++++++
>>
>> I want to create a dataset that list all customer industries that are
>> also supplier industry, i.e., my output shall appear as :
>>
>> CustomerIndustry SupplierIndustry Input
>>
>> 4000B 3000A 19
>>
>> 4000B 2000A 20
>>
>> 3000A 3000B 18
>>
>> 3000A 3000D 12
>>
>> 2000A 1000D 25
>>
>> I am asking for your help on coding this.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/