| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: how to assign new identifier numbers with duplicates
Dear Svend,
That's really great! Thank you so much for the demenstration of building new
id combining the region (or subregion) with firm number. I am learning Stata
and you really helped me.
Linda
From: "Svend Juul" <[email protected]>
Reply-To: [email protected]
To: <[email protected]>
Subject: Re: st: how to assign new identifier numbers with duplicates Date:
Wed, 28 Mar 2007 09:35:55 +0200
Linda wrote:
I am using a firm-level panel data for performance analysis
of firms. But I found that my dataset has duplicated identifiers
(e.g. the same identity numbers for two firms in two different
regions in a certain year). My dataset looks like as follows
(a1 is a code for high-level region, a2 is the subregions, a3
is the firm identity number):
year a1 a2 a3
1995 450 57 206141
1995 450 54 206141
1996 450 57 206141
1996 450 54 206141
1997 450 57 206141
1997 450 54 206141
1995 470 41 223243
1995 470 43 223243
1995 470 44 223243
1996 470 41 223243
1996 470 44 223243
1997 470 41 223243
1998 470 41 223243
2000 470 41 223243
This moment, I don't want to consider the differences of subregions.
So, I want to change the identity number such that I have uniquely
identified observations by the identifier variable a3 and year....
----------------------------------------------------------------
I am not sure what you want. You tell us that two different firms
in two different regions (a1) can have the same id number (a3). You
then consider it a problem that the same firm id occurs in several
times in a given year. From you sample data it seems that this
occurs because the same firm has a record for each year and subregion,
but you don't want to consider subregions.
If the problem is, as you describe, that the same firm id is used for
different firms in different regions, you could combine firm id (a3)
and region id (a1) to get a unique firm id. But first a couple of
warnings.
1) In your attempts you -replace-d the original firm id by a modified
id,
thus destroying the original information. This is dangerous behaviour.
2) In long id numbers you may get precision problems; use string
variables
to prevent that (see http://www.ats.ucla.edu/stat/stata/faq/longid.htm).
Here I construct the string variable -newid-; for the first observation
in the sample data it becomes "206141-450":
generate sa1=string(a1,"%03.0f")
generate sa3=string(a3,"%6.0f")
generate newid=sa3 + "-" + sa1
Hope this helps
Svend
__________________________________________
Svend Juul
Institut for Folkesundhed, Afdeling for Epidemiologi
(Institute of Public Health, Department of Epidemiology)
Vennelyst Boulevard 6
DK-8000 Aarhus C, Denmark
Phone: +45 8942 6090
Home: +45 8693 7796
Email: [email protected]
__________________________________________
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
_________________________________________________________________
Play online games with your friends with Messenger
http://www.join.msn.com/messenger/overview
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/