|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: STATA code for identifying similar observations within groupid
Please note that you are asked not to send attachments to the list and
not to include copies of irrelevant previous mailings.
The same kinds of problems came up a few days ago. For example, see
<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0805/Author/article-1121.html>
and linked postings.
Dummy for inventors on the same patent all from the same country:
bysort pub_nbr (inv_cou) :
gen byte same_inv_cou = inv_cou[1] == inv_cou[_N]
Dummy for all Eastern European:
You need a variable in_EE, 1 for in EE and 0 otherwise. Then it's almost
the same idea:
bysort pub_nbr (is_EE) :
gen byte all_in_EE = in_EE[1] == in_EE[_N] & in_EE[1] == 1
Or alternatively
bysort pub_nbr (is_EE) :
gen byte all_in_EE = in_EE[1] == 1 & in_EE[_N] == 1
At least one country is in OECD:
You need a variable in_OECD, 1 for in OECD and O otherwise. Then
bysort pub_nbr (is_OECD) :
gen byte any_in_OECD = in_OECD[_N] == 1
I am not clear about your fourth definition. It should yield to similar
technique.
How do you get these extra variables? See the linked FAQs
How do I select a subset of observations using a complicated criterion?
<http://www.stata.com/support/faqs/data/selectid.html>
How do you define group characteristics in your data in order to create
subsets?
<http://www.stata.com/support/faqs/data/characteristics.html>
You may find the -merge- method easiest for your set-up.
Nick
[email protected]
Chirantan Chatterjee [edited]
I am working on a dataset of European Patents, patents that have at
least one inventor (variable -inv_cou-) belonging to an Eastern European
country.
There are 21 such EE countries, identified with International Patent
Classification codes. Thus for the patent EP1701504, there are 5
inventors, 4 German, identified by "DE" in -inv_cou-, and one Polish,
identified by "PL". Apart from EE countries, inventors for a
multi-inventor patent also come from OECD countries, again identified by
IPC codes, DE for Germany, MX for Mexico, KR for Korea and likewise.
The observations are not uniquely identified by the patent identifier,
pub_nbr or publication number. Thus for patent EP1701504, EP1701504 is
the value under pub_nbr which is stacked one upon another for each of
its 5 inventors. There are some other characteristics too for a patent
that come in the dataset.
Here is a shortened sketch for the data, for patent EP0000287,
identified by pub_nbr, the patent identifier: it has two inventors
stacked one upon another.
pub_nbr inv_name inv_city inv_cou inv_total app_city app_cou app_name
EP0000287 Szab�, S�ndor Budapest XIHU 5 Budapest HU AUT�IPARI
EP0000287 Vad, L�szl� Visegr�d HU 5 Budapest HU Ikarus
My objective is to create dummy variables telling me whether the
inventors that created the patent are:
a. Located in the same country.
b. Resident in multiple countries, but all of the countries are EE
countries. (Have the EE country code set)
c. Resident in multiple countries, and at least one of the countries is
an OECD member state. (Have the OECD country code set)
d. When the patent applicant is located in an OECD country, app_cou
identifies applicant country like for inventor countries as you will see
in the attached sample of the data.
What is the best way to create each of the four dummy variables?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/