Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: how to search every observation of one variable in another variable
From
"Joseph Coveney" <[email protected]>
To
<[email protected]>
Subject
st: Re: how to search every observation of one variable in another variable
Date
Thu, 13 Jun 2013 10:58:24 +0900
ibrahim bostan wrote:
I have a dataset which have id number of patents applied by a firm and
the id number of patents cited by this patent. I am trying to have an
indicator, as below shown, which will equal one if patent applied by
the firm is citing a patent which is applied by the same firm.
cited pt_no citing_pt_no patent owner indicator
10 20 a 0
11 21 a 0
21 22 a 1
20 23 a 1
20 24 b 0
24 25 b 1
25 26 b 1
1 27 c 0
3 28 c 0
5 29 c 0
--------------------------------------------------------------------------------
You could try something like that below.
-merge- can't . . . JOIN . . . ON A.one_column = B.another_column, and so the
example involves explicitly forming the Cartesian product followed by the
restriction.
You could try an in-memory hash-table approach (
www.stata.com/statalist/archive/2013-06/msg00569.html ), too, if your dataset
is small enough.
Joseph Coveney
. version 12.1
.
. clear *
. set more off
.
. input cited_pt_no citing_pt_no str1 patent_owner
cited_p~o citing_~o patent_~r
1. 10 20 a
2. 11 21 a
3. 21 22 a
4. 20 23 a
5. 20 24 b
6. 24 25 b
7. 25 26 b
8. 1 27 c
9. 3 28 c
10. 5 29 c
11. end
.
. // Create dataset of citing-patent owners
. preserve
. isid citing_pt_no
. rename patent_owner citing_owner
. list citing*, noobs abbreviate(20)
+-----------------------------+
| citing_pt_no citing_owner |
|-----------------------------|
| 20 a |
| 21 a |
| 22 a |
| 23 a |
| 24 b |
|-----------------------------|
| 25 b |
| 26 b |
| 27 c |
| 28 c |
| 29 c |
+-----------------------------+
. tempfile tmpfil0
. quietly save `tmpfil0'
.
. // Create a dataset of cited-patent owners
. restore
. preserve
. contract cited, freq(discard)
. rename cited_pt_no citing_pt_no
. merge 1:1 citing_pt_no using `tmpfil0', nogenerate noreport
. replace cited_pt_no = citing_pt_no
(15 real changes made)
. rename citing_owner cited_owner
. quietly replace cited_owner = "Other" if mi(cited_owner)
. keep cited*
. list , noobs separator(0) abbreviate(20)
+---------------------------+
| cited_pt_no cited_owner |
|---------------------------|
| 1 Other |
| 3 Other |
| 5 Other |
| 10 Other |
| 11 Other |
| 20 a |
| 21 a |
| 24 b |
| 25 b |
| 22 a |
| 23 a |
| 26 b |
| 27 c |
| 28 c |
| 29 c |
+---------------------------+
.
. // Compare cited owner to citing owner
. cross using `tmpfil0'
. quietly save `tmpfil0', replace
. restore
. contract *pt_no, freq(discard)
. merge 1:m citing_pt_no cited_pt_no using `tmpfil0', ///
> assert(match using) keep(match) nogenerate noreport
. generate byte indicator = cited_owner == citing_owner
. list *_pt_no *_owner indicator, noobs separator(0) abbreviate(20)
+---------------------------------------------------------------------+
| cited_pt_no citing_pt_no cited_owner citing_owner indicator |
|---------------------------------------------------------------------|
| 10 20 Other a 0 |
| 11 21 Other a 0 |
| 21 22 a a 1 |
| 20 23 a a 1 |
| 20 24 a b 0 |
| 24 25 b b 1 |
| 25 26 b b 1 |
| 1 27 Other c 0 |
| 3 28 Other c 0 |
| 5 29 Other c 0 |
+---------------------------------------------------------------------+
.
. exit
end of do-file
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/