Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Unique identifier from a string name
From
[email protected] (Brendan Halpin)
To
[email protected]
Subject
Re: st: Unique identifier from a string name
Date
Thu, 24 Nov 2011 16:18:47 +0000
If you really need a deterministic mapping between string and integer,
it might be worth dumbing down the strings as much as possible first
(e.g. remove spaces, punctuation, make lowercase). Then map each symbol
(in the much reduced set, perhaps only 26) to a single integer and
proceed as I suggested before, but with 26 (or whatever) as the
multiplier instead of 256.
I am presuming that you have strings in different datasets that you want
to match, so that encode won't work because it assigns integers on the
basis of the strings currently available to it. It might be worth,
though, seeing if you can create a master data set (e.g. by appending
rather than merging) and then encoding. You could then split out the
original data sets and merge.
Brendan
--
Brendan Halpin, Department of Sociology, University of Limerick, Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-338562; Room F1-009 x 3147
mailto:[email protected] ULSociology on Facebook: http://on.fb.me/fjIK9t
http://teaching.sociology.ul.ie/bhalpin/wordpress twitter:@ULSociology
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/