|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AW: st: How to create a random number identifier number
With the data still sorted by HH, couldn't you simply replace the random
number with the random number of the first observation (within HH).
Immediately after generating the random sortvar:
by hhid: replace sortvar=sortvar[1]
- Nick Winter
Anna Reimondos wrote:
I sucessfully implemented the solution proposed, and checked that
these were in fact unique identifiers. However I then ran into another
problem, when trying to do a similar thing for households!
Each of the 11,000 people live in households (around 5,000 households
in total) and there is a unique 5 digit household identifier which can
be used to see which people live in the same household. In other
words, several persons (identified by personid) live in the same
household (hhid). In the same way as I did for the "personid" I would
also like to create a new household identifier, that has five digits
and is unique.
Example:
person hhid "newhhid"
1 25643 13584
2 25643 13584
3 68534 34257
I tried modifying the code for the person id, and applying it to the
household id but this does not work because I can't randomly sort them
using the 'sortvar' variable, because it then loses the natural
ordering of the same household being on consecutive lines. My current
solution works I think but it means I keep only one line per
household, save off a new dataset, randomly sort it , create the new
identifier and then merge it back in. ...Would there be a way to do it
, while still "staying" in the original dataset?
*-----------------------------------------------------------------------------------------------
*Save dataset
capture drop sortvar //As before-
random number for random sorting
gen sortvar=1 + int(12759*uniform())
replace sortvar=sortvar+10000 if sort<10000
bysort hhid: gen numbers=_N //How many people live in
the household
keep hhid numbers sortvar
bysort ehhrhid: gen first=_n if _n==1 //Identify the 1
observation in each household
keep if first==1 //keep only 1
observation (first) per household
sort sortvar //randomly sort the data
gen newhhid =_n //new household Id
replace newhhid=newhhid+100000 if newhhid<=10000
expand numbers //Expand so each
household has as many rows as people in household
sort ehhrhid
*Merge back this dataset using hhid, into the original dataset.
*-----------------------------------------------------------------------------------------------
My original problem has been solved, and my current solution kind of
works but I would be interested to hear if any one has a more elegant
way of doing this...
Thanks very much,
Anna
On Fri, Nov 13, 2009 at 6:10 AM, Michael McCulloch <[email protected]> wrote:
Thanks Martin. I imagine there's also a simpler (i.e. more elegant)
way to also create the 5-digit new id than this?:
replace newpersonid=newpersonid+50000 if newpersonid<11000
On Nov 12, 2009, at 12:11 AM, Martin Weiss wrote:
<>
The -destring- line could easily be omitted, without loss of
functionality...
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Michael
McCulloch
Gesendet: Donnerstag, 12. November 2009 04:16
An: [email protected]
Betreff: Re: st: How to create a random number identifier number
Anna,
This simulated example is a better approach, that is faithful to your
need for the newpersonid to have 5 digits.
Michael
********* begin example
clear
set obs 11000
gen personid=_n
replace personid=personid+10000 if personid<10000
gen sortvar=1 + int(11000*uniform())
replace sortvar=sortvar+10000 if sort<10000
sort sortvar
gen newpersonid str5=_n
destring newpersonid, replace
replace newpersonid=newpersonid+50000 if newpersonid<11000
list personid newpersonid in 10050/11000
codebook
********* end example
Dear Anna, if you sort on some variable other than personid, or
perform a random sort, you could then:
gen new_personid = _n
This creates a variable which has a value equal to the sequence # of
that record, which is why you have to create some sort order other
than personid.
Michael
On Nov 11, 2009, at 6:37 PM, Anna Reimondos wrote:
Hello,
I am experiencing problems creating a unique set of number for my
dataset.
I have a dataset with around 11,000 subjects or persons, and each one
of these subjects has a unique identifier that is 5 digits long
(personid).
I need to create a concordance file which list the original 5 digit
"personid" and matches this to another new randomly created
identifier
for each person. This new identifier (new_personid) also has to be 5
digits long.
Example:
personid new_personid
10526 35624
18594 21893
54632 12489
I have tried playing around with the gen x = uniform() function but
to no avail. I am unable to create exactly 11,000 unique numbers with
5 digits.
I also tried just using the egen x=se() command, but then the ids are
sequential and not random and I am afraid then perhaps someone could
figure out how to match the personid and the newperson id....
Any help would be much appreciated,
Thanks
Anna
(Using STATA 10.1, Windows Vista)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Michael McCulloch
Pine Street Foundation
124 Pine Street
San Anselmo, CA 94960-2674
tel: 415-407-1357
fax: 206-338-2391
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Michael McCulloch
Pine Street Foundation
124 Pine Street
San Anselmo, CA 94960-2674
tel: 415-407-1357
fax: 206-338-2391
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
--------------------------------------------------------------
Nicholas Winter 434.924.6994 t
Assistant Professor 434.924.3359 f
Department of Politics [email protected] e
University of Virginia faculty.virginia.edu/nwinter w
PO Box 400787, 100 Cabell Hall
Charlottesville, VA 22904
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/