First off, Lindsay's obvious need is to concatenate string variables. That can and should be done directly using + or -egen, concat()-. There is never any need to convert to numeric and then do the equivalent arithmetic.
gen hhidpn = hhid + pn
should suffice, and if not, there will be small work-arounds. (If Lindsay absolutely needs a numeric version of that result, then -destring- is the way to go, subject to the point below.)
Second, as Martin has hinted, the problem is not in -destring-, an excellent command, but in the way it is being used.
Lindsay insisted on -float- results, but needs extra precision to carry each digit precisely. Don't use -float- therefore.
Third, it seems that the -float- option could be documented in more detail to warn of this problem (or undocumented so that only more experienced users discover it!). Otherwise users could be punished, like Lindsay, in being given what they ask for.
Nick
[email protected]
Martin Weiss
Also note http://www.ats.ucla.edu/stat/stata/faq/longid.htm
and
http://www.stata.com/support/faqs/data/prec.html
Lindsay
I am using Stata/SE 10.1 and having problems executing what should be
a really simple operation. My dataset has household id (hhid) and
person number (pn) variables in string format. I need to combine them
into one numeric unique identifer of the form hhidpn = hhid*1000+pn in
order to merge them with other data. After I destring the original
IDs (which appears to work fine) and perform this operation, some of
the identifiers are duplicates. It looks like Stata is somehow adding
some of the numbers incorrectly (mostly they are +/-1 from what they
should be). I have copied some of the output below.
I've also tried adding the two string variables first and then
destringing and I get the same problem. The string variable with both
IDs combined looks right, but after I destring some are wrong. Any
suggestions what might be going on?
Thanks, Lindsay
/**** FIRST METHOD (DESTRING THEN ADD) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;
. destring HHID, gen(hhid) float;
HHID has all characters numeric; hhid generated as float
. destring PN, gen(pn) float;
PN has all characters numeric; pn generated as byte
. replace hhid = hhid*1000;
(30712 real changes made)
. format hhid %9.0f;
. gen hhidpn = hhid + pn;
. format hhidpn %9.0f;
. sort hhidpn
. list HHID hhid PN pn hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
+-------------------------------------------+
HHID hhid PN pn hhidpn
-------------------------------------------
1501. 016973 16973000 031 31 16973032
1502. 016973 16973000 032 32 16973032
1641. 017530 17530000 040 40 17530040
1642. 017530 17530000 041 41 17530040
1661. 017641 17641000 011 11 17641012
-------------------------------------------
1662. 017641 17641000 012 12 17641012
1666. 017646 17646000 040 40 17646040
1667. 017646 17646000 041 41 17646040
1679. 017707 17707000 040 40 17707040
1680. 017707 17707000 041 41 17707040
-------------------------------------------
1832. 018435 18435000 040 40 18435040
1833. 018435 18435000 041 41 18435040
1849. 018482 18482000 040 40 18482040
1850. 018482 18482000 041 41 18482040
1854. 018494 18494000 020 20 18494020
....
/**** SECOND METHOD (ADD THEN DESTRING) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;
. gen HHIDPN = HHID + PN;
. destring HHIDPN, gen(hhidpn) float;
HHIDPN has all characters numeric; hhidpn generated as float
. format hhidpn %9.0f;
. sort hhidpn
. list HHID PN HHIDPN hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
+--------------------------------------+
HHID PN HHIDPN hhidpn
--------------------------------------
1501. 016973 031 016973031 16973032
1502. 016973 032 016973032 16973032
1641. 017530 040 017530040 17530040
1642. 017530 041 017530041 17530040
1661. 017641 011 017641011 17641012
--------------------------------------
1662. 017641 012 017641012 17641012
1666. 017646 040 017646040 17646040
1667. 017646 041 017646041 17646040
1679. 017707 040 017707040 17707040
1680. 017707 041 017707041 17707040
--------------------------------------
1832. 018435 040 018435040 18435040
1833. 018435 041 018435041 18435040
1849. 018482 040 018482040 18482040
1850. 018482 041 018482041 18482040
1854. 018494 020 018494020 18494020
....
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/