Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: AW: problem with destring


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: AW: problem with destring
Date   Thu, 24 Sep 2009 15:49:20 +0100

Indeed. When you go 

generate <newvar> = <numeric expression> 

by default the new variable is a -float-, so the same issue arises. You need to be explicit that you want a -double- or -long- instead. 

generate double <newvar> = <numeric expression>

generate long <newvar> = <numeric expression>

Nick 
[email protected] 

Lindsay

Thank you, Martin and Nick for your help.  When I first concatenate
the string variable and then destring without specifying float I get
what I need.

Using my first method (which I now realize was unnecessary) even
without specifying float when using -destring- my hhid variable is
stored as long and my pn variable as int, though when I combine them
(hhidpn = hhid*1000+pn) Stata was storing this as float, which was
what originally gave me the problem I was having.  The more
parsimonious method is clearly preferable.  Thanks for your help.

On Wed, Sep 23, 2009 at 12:18 PM, Nick Cox <[email protected]> wrote:

> First off, Lindsay's obvious need is to concatenate string variables. That can and should be done directly using + or -egen, concat()-. There is never any need to convert to numeric and then do the equivalent arithmetic.
>
> gen hhidpn = hhid + pn
>
> should suffice, and if not, there will be small work-arounds. (If Lindsay absolutely needs a numeric version of that result, then -destring- is the way to go, subject to the point below.)
>
> Second, as Martin has hinted, the problem is not in -destring-, an excellent command, but in the way it is being used.
>
> Lindsay insisted on -float- results, but needs extra precision to carry each digit precisely. Don't use -float- therefore.
>
> Third, it seems that the -float- option could be documented in more detail to warn of this problem (or undocumented so that only more experienced users discover it!). Otherwise users could be punished, like Lindsay, in being given what they ask for.

Martin Weiss

> Also note http://www.ats.ucla.edu/stat/stata/faq/longid.htm
> and
> http://www.stata.com/support/faqs/data/prec.html

Lindsay

> I am using Stata/SE 10.1 and having problems executing what should be
> a really simple operation.  My dataset has household id (hhid) and
> person number (pn) variables in string format.  I need to combine them
> into one numeric unique identifer of the form hhidpn = hhid*1000+pn in
> order to merge them with other data.  After I destring the original
> IDs (which appears to work fine) and perform this operation, some of
> the identifiers are duplicates.  It looks like Stata is somehow adding
> some of the numbers incorrectly (mostly they are +/-1 from what they
> should be).  I have copied some of the output below.
>
> I've also tried adding the two string variables first and then
> destringing and I get the same problem.  The string variable with both
> IDs combined looks right, but after I destring some are wrong.  Any
> suggestions what might be going on?
>
> Thanks, Lindsay
>
> /**** FIRST METHOD (DESTRING THEN ADD) ****/
> . use "${dir}Geographic Identifiers\RGEO.dta", clear;
>
> . destring HHID, gen(hhid) float;
> HHID has all characters numeric; hhid generated as float
>
> . destring PN, gen(pn) float;
> PN has all characters numeric; pn generated as byte
>
> . replace hhid = hhid*1000;
> (30712 real changes made)
>
> . format hhid %9.0f;
>
> . gen hhidpn = hhid + pn;
>
> . format hhidpn %9.0f;
>
> . sort hhidpn
>
> . list  HHID hhid PN pn hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
>
> +-------------------------------------------+
>                HHID        hhid    PN   pn      hhidpn
> -------------------------------------------
> 1501.  016973    16973000   031   31    16973032
> 1502.  016973    16973000   032   32    16973032
> 1641.  017530    17530000   040   40    17530040
> 1642.  017530    17530000   041   41    17530040
> 1661.  017641    17641000   011   11    17641012
> -------------------------------------------
> 1662.  017641    17641000   012   12    17641012
> 1666.  017646    17646000   040   40    17646040
> 1667.  017646    17646000   041   41    17646040
> 1679.  017707    17707000   040   40    17707040
> 1680.  017707    17707000   041   41    17707040
> -------------------------------------------
> 1832.  018435    18435000   040   40    18435040
> 1833.  018435    18435000   041   41    18435040
> 1849.  018482    18482000   040   40    18482040
> 1850.  018482    18482000   041   41    18482040
> 1854.  018494    18494000   020   20    18494020
> ....
>
> /**** SECOND METHOD (ADD THEN DESTRING) ****/
> . use "${dir}Geographic Identifiers\RGEO.dta", clear;
>
> . gen HHIDPN = HHID + PN;
>
> . destring HHIDPN, gen(hhidpn) float;
> HHIDPN has all characters numeric; hhidpn generated as float
>
> . format hhidpn %9.0f;
>
> . sort hhidpn
>
> . list  HHID PN HHIDPN hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]
>
> +--------------------------------------+
>              HHID    PN      HHIDPN      hhidpn
> --------------------------------------
> 1501.  016973   031   016973031    16973032
> 1502.  016973   032   016973032    16973032
> 1641.  017530   040   017530040    17530040
> 1642.  017530   041   017530041    17530040
> 1661.  017641   011   017641011    17641012
> --------------------------------------
> 1662.  017641   012   017641012    17641012
> 1666.  017646   040   017646040    17646040
> 1667.  017646   041   017646041    17646040
> 1679.  017707   040   017707040    17707040
> 1680.  017707   041   017707041    17707040
> --------------------------------------
> 1832.  018435   040   018435040    18435040
> 1833.  018435   041   018435041    18435040
> 1849.  018482   040   018482040    18482040
> 1850.  018482   041   018482041    18482040
> 1854.  018494   020   018494020    18494020
> ....

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index