Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Unique identifier from a string name
From
Maarten Buis <[email protected]>
To
[email protected]
Subject
Re: st: Unique identifier from a string name
Date
Thu, 24 Nov 2011 17:03:05 +0100
On Thu, Nov 24, 2011 at 4:10 PM, Barry Quinn wrote:
> The context of the problem is to build a panel from yearly data using firm names as the unique id with the -merge- command.
One solution is first create a file with all firms, create the unique
identifier, and merge these identifier on to all subsequent files.
Say you have three years stored in files called year1 year2 year3, and
the firm name is stored in variable firm:
*---------- begin example ----------
// stack all files
use year1
forvalues i = 2/3 {
append using year`i'
}
// keep only the firm names
keep firm
// we only need one observation per firm
bys firm : keep if _n == 1
// create the unique id
gen firmid = _n
// save this key in a file
save idkey, replace
// add the id to each dataset
forvalues i = 1/3 {
use year`i'
merge 1:1 firm using idkey
// every firm in year`i' got an id
assert _merge != 1
// not all firms have to appear in year `i'
drop if _merge == 2
// _merge is no longer necessary
drop _merge
// I never overwrite the original data
// hence a new filename
save year`i'_id, replace
}
*------------ end example ------------
The only problem with this approach is that it assumes that the
variable firm contains no typos, that there are no legitimate (or
illegitimate) alternative spellings and/or abbreviations, and that the
firm names remained constant. In practice that is highly unlikely, so
I would carefully check the idkey file before merging.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/