<>
Have you thought about -egen, group()- as an alternative to your method?
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Lindsay
Gesendet: Mittwoch, 23. September 2009 16:01
An: [email protected]
Betreff: st: problem with destring
I am using Stata/SE 10.1 and having problems executing what should be
a really simple operation. My dataset has household id (hhid) and
person number (pn) variables in string format. I need to combine them
into one numeric unique identifer of the form hhidpn = hhid*1000+pn in
order to merge them with other data. After I destring the original
IDs (which appears to work fine) and perform this operation, some of
the identifiers are duplicates. It looks like Stata is somehow adding
some of the numbers incorrectly (mostly they are +/-1 from what they
should be). I have copied some of the output below.
I've also tried adding the two string variables first and then
destringing and I get the same problem. The string variable with both
IDs combined looks right, but after I destring some are wrong. Any
suggestions what might be going on?
Thanks, Lindsay
/**** FIRST METHOD (DESTRING THEN ADD) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;
. destring HHID, gen(hhid) float;
HHID has all characters numeric; hhid generated as float
. destring PN, gen(pn) float;
PN has all characters numeric; pn generated as byte
. replace hhid = hhid*1000;
(30712 real changes made)
. format hhid %9.0f;
. gen hhidpn = hhid + pn;
. format hhidpn %9.0f;
. sort hhidpn
. list HHID hhid PN pn hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
+-------------------------------------------+
HHID hhid PN pn hhidpn
-------------------------------------------
1501. 016973 16973000 031 31 16973032
1502. 016973 16973000 032 32 16973032
1641. 017530 17530000 040 40 17530040
1642. 017530 17530000 041 41 17530040
1661. 017641 17641000 011 11 17641012
-------------------------------------------
1662. 017641 17641000 012 12 17641012
1666. 017646 17646000 040 40 17646040
1667. 017646 17646000 041 41 17646040
1679. 017707 17707000 040 40 17707040
1680. 017707 17707000 041 41 17707040
-------------------------------------------
1832. 018435 18435000 040 40 18435040
1833. 018435 18435000 041 41 18435040
1849. 018482 18482000 040 40 18482040
1850. 018482 18482000 041 41 18482040
1854. 018494 18494000 020 20 18494020
....
/**** SECOND METHOD (ADD THEN DESTRING) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;
. gen HHIDPN = HHID + PN;
. destring HHIDPN, gen(hhidpn) float;
HHIDPN has all characters numeric; hhidpn generated as float
. format hhidpn %9.0f;
. sort hhidpn
. list HHID PN HHIDPN hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
+--------------------------------------+
HHID PN HHIDPN hhidpn
--------------------------------------
1501. 016973 031 016973031 16973032
1502. 016973 032 016973032 16973032
1641. 017530 040 017530040 17530040
1642. 017530 041 017530041 17530040
1661. 017641 011 017641011 17641012
--------------------------------------
1662. 017641 012 017641012 17641012
1666. 017646 040 017646040 17646040
1667. 017646 041 017646041 17646040
1679. 017707 040 017707040 17707040
1680. 017707 041 017707041 17707040
--------------------------------------
1832. 018435 040 018435040 18435040
1833. 018435 041 018435041 18435040
1849. 018482 040 018482040 18482040
1850. 018482 041 018482041 18482040
1854. 018494 020 018494020 18494020
....
--
Lindsay Sabik
Doctoral Candidate in Health Policy
Harvard University
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/