Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify observations that appear in a list
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Identify observations that appear in a list
Date
Thu, 13 Mar 2014 11:50:26 +0000
This is an FAQ, at least in the sense that this is frequently asked here.
One approach is just to -merge- the data with a reduced copy of
itself, with the important twist that you -rename- what you want as an
identifier.
The slogan I use to remind myself of this trick is
"-merge- is for finding intersections as well as unions"
and you're welcome to pin or write it on a board near you.
http://www.stata.com/support/faqs/data-management/group-characteristics-for-subsets/
is also relevant.
. clear
. input str5 CustomerIndustry str5 SupplierIndustry Input
Custome~y Supplie~y Input
1. 1000A 4000B 100
2. 1000A 3000A 200
3. 1000A 3000B 100
4. 1000B 4000B 50
5. 1000B 2000A 8
6. 4000B 3000A 19
7. 4000B 2000A 20
8. 3000A 3000B 18
9. 3000A 3000D 12
10. 2000A 1000D 25
11. end
. save tostart
file tostart.dta saved
. bysort SupplierIndustry: keep if _n == 1
(4 observations deleted)
. keep SupplierIndustry
. rename SupplierIndustry CustomerIndustry
. merge 1:m CustomerIndustry using tostart
Result # of obs.
-----------------------------------------
not matched 8
from master 3 (_merge==1)
from using 5 (_merge==2)
matched 5 (_merge==3)
-----------------------------------------
. tab _merge
_merge | Freq. Percent Cum.
------------------------+-----------------------------------
master only (1) | 3 23.08 23.08
using only (2) | 5 38.46 61.54
matched (3) | 5 38.46 100.00
------------------------+-----------------------------------
Total | 13 100.00
.
end of do-file
. list if _merge==3
+-------------------------------------------+
| Custom~y Suppli~y Input _merge |
|-------------------------------------------|
2. | 2000A 1000D 25 matched (3) |
3. | 3000A 3000B 18 matched (3) |
6. | 4000B 3000A 19 matched (3) |
12. | 3000A 3000D 12 matched (3) |
13. | 4000B 2000A 20 matched (3) |
+-------------------------------------------+
Nick
[email protected]
On 13 March 2014 02:12, R Zhang <[email protected]> wrote:
> I have the following data set (HAVE) (only provide a few observations
> as illustration). The input variable gives the dollar input sold by
> supplier to customer. You will notice that customer industry 4000B,
> 3000A also appear in SupplierIndustry. This indicates that some
> industries can be both suppliers and customer.
>
> +++++++++++++++++++++++
>
> HAVE
>
> CustomerIndustry SupplierIndustry Input
>
> 1000A 4000B 100
>
> 1000A 3000A 200
>
> 1000A 3000B 100
>
> 1000B 4000B 50
>
> 1000B 2000A 8
>
> 4000B 3000A 19
>
> 4000B 2000A 20
>
> 3000A 3000B 18
>
> 3000A 3000D 12
>
> 2000A 1000D 25
>
> +++++++++++++++++++++++
>
> I want to create a dataset that list all customer industries that are
> also supplier industry, i.e., my output shall appear as :
>
> CustomerIndustry SupplierIndustry Input
>
> 4000B 3000A 19
>
> 4000B 2000A 20
>
> 3000A 3000B 18
>
> 3000A 3000D 12
>
> 2000A 1000D 25
>
> I am asking for your help on coding this.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/