Thank you, Martin and Nick for your help. When I first concatenate
the string variable and then destring without specifying float I get
what I need.
Using my first method (which I now realize was unnecessary) even
without specifying float when using -destring- my hhid variable is
stored as long and my pn variable as int, though when I combine them
(hhidpn = hhid*1000+pn) Stata was storing this as float, which was
what originally gave me the problem I was having. The more
parsimonious method is clearly preferable. Thanks for your help.
Lindsay
On Wed, Sep 23, 2009 at 12:18 PM, Nick Cox <[email protected]> wrote:
> First off, Lindsay's obvious need is to concatenate string variables. That can and should be done directly using + or -egen, concat()-. There is never any need to convert to numeric and then do the equivalent arithmetic.
>
> gen hhidpn = hhid + pn
>
> should suffice, and if not, there will be small work-arounds. (If Lindsay absolutely needs a numeric version of that result, then -destring- is the way to go, subject to the point below.)
>
> Second, as Martin has hinted, the problem is not in -destring-, an excellent command, but in the way it is being used.
>
> Lindsay insisted on -float- results, but needs extra precision to carry each digit precisely. Don't use -float- therefore.
>
> Third, it seems that the -float- option could be documented in more detail to warn of this problem (or undocumented so that only more experienced users discover it!). Otherwise users could be punished, like Lindsay, in being given what they ask for.
>
> Nick
> [email protected]
>
> Martin Weiss
>
> Also note http://www.ats.ucla.edu/stat/stata/faq/longid.htm
> and
> http://www.stata.com/support/faqs/data/prec.html
>
> Lindsay
>
> I am using Stata/SE 10.1 and having problems executing what should be
> a really simple operation. My dataset has household id (hhid) and
> person number (pn) variables in string format. I need to combine them
> into one numeric unique identifer of the form hhidpn = hhid*1000+pn in
> order to merge them with other data. After I destring the original
> IDs (which appears to work fine) and perform this operation, some of
> the identifiers are duplicates. It looks like Stata is somehow adding
> some of the numbers incorrectly (mostly they are +/-1 from what they
> should be). I have copied some of the output below.
>
> I've also tried adding the two string variables first and then
> destringing and I get the same problem. The string variable with both
> IDs combined looks right, but after I destring some are wrong. Any
> suggestions what might be going on?
>
> Thanks, Lindsay
>
> /**** FIRST METHOD (DESTRING THEN ADD) ****/
> . use "${dir}Geographic Identifiers\RGEO.dta", clear;
>
> . destring HHID, gen(hhid) float;
> HHID has all characters numeric; hhid generated as float
>
> . destring PN, gen(pn) float;
> PN has all characters numeric; pn generated as byte
>
> . replace hhid = hhid*1000;
> (30712 real changes made)
>
> . format hhid %9.0f;
>
> . gen hhidpn = hhid + pn;
>
> . format hhidpn %9.0f;
>
> . sort hhidpn
>
> . list HHID hhid PN pn hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
>
> +-------------------------------------------+
> HHID hhid PN pn hhidpn
> -------------------------------------------
> 1501. 016973 16973000 031 31 16973032
> 1502. 016973 16973000 032 32 16973032
> 1641. 017530 17530000 040 40 17530040
> 1642. 017530 17530000 041 41 17530040
> 1661. 017641 17641000 011 11 17641012
> -------------------------------------------
> 1662. 017641 17641000 012 12 17641012
> 1666. 017646 17646000 040 40 17646040
> 1667. 017646 17646000 041 41 17646040
> 1679. 017707 17707000 040 40 17707040
> 1680. 017707 17707000 041 41 17707040
> -------------------------------------------
> 1832. 018435 18435000 040 40 18435040
> 1833. 018435 18435000 041 41 18435040
> 1849. 018482 18482000 040 40 18482040
> 1850. 018482 18482000 041 41 18482040
> 1854. 018494 18494000 020 20 18494020
> ....
>
> /**** SECOND METHOD (ADD THEN DESTRING) ****/
> . use "${dir}Geographic Identifiers\RGEO.dta", clear;
>
> . gen HHIDPN = HHID + PN;
>
> . destring HHIDPN, gen(hhidpn) float;
> HHIDPN has all characters numeric; hhidpn generated as float
>
> . format hhidpn %9.0f;
>
> . sort hhidpn
>
> . list HHID PN HHIDPN hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
>
> +--------------------------------------+
> HHID PN HHIDPN hhidpn
> --------------------------------------
> 1501. 016973 031 016973031 16973032
> 1502. 016973 032 016973032 16973032
> 1641. 017530 040 017530040 17530040
> 1642. 017530 041 017530041 17530040
> 1661. 017641 011 017641011 17641012
> --------------------------------------
> 1662. 017641 012 017641012 17641012
> 1666. 017646 040 017646040 17646040
> 1667. 017646 041 017646041 17646040
> 1679. 017707 040 017707040 17707040
> 1680. 017707 041 017707041 17707040
> --------------------------------------
> 1832. 018435 040 018435040 18435040
> 1833. 018435 041 018435041 18435040
> 1849. 018482 040 018482040 18482040
> 1850. 018482 041 018482041 18482040
> 1854. 018494 020 018494020 18494020
> ....
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/