Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: AW: problem with destring


From   "Martin Weiss" <[email protected]>
To   <[email protected]>
Subject   st: AW: problem with destring
Date   Wed, 23 Sep 2009 16:02:07 +0200

<> 



Have you thought about -egen, group()- as an alternative to your method?



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Lindsay
Gesendet: Mittwoch, 23. September 2009 16:01
An: [email protected]
Betreff: st: problem with destring

I am using Stata/SE 10.1 and having problems executing what should be
a really simple operation.  My dataset has household id (hhid) and
person number (pn) variables in string format.  I need to combine them
into one numeric unique identifer of the form hhidpn = hhid*1000+pn in
order to merge them with other data.  After I destring the original
IDs (which appears to work fine) and perform this operation, some of
the identifiers are duplicates.  It looks like Stata is somehow adding
some of the numbers incorrectly (mostly they are +/-1 from what they
should be).  I have copied some of the output below.

I've also tried adding the two string variables first and then
destringing and I get the same problem.  The string variable with both
IDs combined looks right, but after I destring some are wrong.  Any
suggestions what might be going on?

Thanks, Lindsay

/**** FIRST METHOD (DESTRING THEN ADD) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;

. destring HHID, gen(hhid) float;
HHID has all characters numeric; hhid generated as float

. destring PN, gen(pn) float;
PN has all characters numeric; pn generated as byte

. replace hhid = hhid*1000;
(30712 real changes made)

. format hhid %9.0f;

. gen hhidpn = hhid + pn;

. format hhidpn %9.0f;

. sort hhidpn

. list  HHID hhid PN pn hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]

+-------------------------------------------+
               HHID        hhid    PN   pn      hhidpn
-------------------------------------------
1501.  016973    16973000   031   31    16973032
1502.  016973    16973000   032   32    16973032
1641.  017530    17530000   040   40    17530040
1642.  017530    17530000   041   41    17530040
1661.  017641    17641000   011   11    17641012
-------------------------------------------
1662.  017641    17641000   012   12    17641012
1666.  017646    17646000   040   40    17646040
1667.  017646    17646000   041   41    17646040
1679.  017707    17707000   040   40    17707040
1680.  017707    17707000   041   41    17707040
-------------------------------------------
1832.  018435    18435000   040   40    18435040
1833.  018435    18435000   041   41    18435040
1849.  018482    18482000   040   40    18482040
1850.  018482    18482000   041   41    18482040
1854.  018494    18494000   020   20    18494020
....

/**** SECOND METHOD (ADD THEN DESTRING) ****/
. use "${dir}Geographic Identifiers\RGEO.dta", clear;

. gen HHIDPN = HHID + PN;

. destring HHIDPN, gen(hhidpn) float;
HHIDPN has all characters numeric; hhidpn generated as float

. format hhidpn %9.0f;

. sort hhidpn

. list  HHID PN HHIDPN hhidpn if hhidpn==hhidpn[_n-1]hhidpn==hhidpn[_n+1]

+--------------------------------------+
             HHID    PN      HHIDPN      hhidpn
--------------------------------------
1501.  016973   031   016973031    16973032
1502.  016973   032   016973032    16973032
1641.  017530   040   017530040    17530040
1642.  017530   041   017530041    17530040
1661.  017641   011   017641011    17641012
--------------------------------------
1662.  017641   012   017641012    17641012
1666.  017646   040   017646040    17646040
1667.  017646   041   017646041    17646040
1679.  017707   040   017707040    17707040
1680.  017707   041   017707041    17707040
--------------------------------------
1832.  018435   040   018435040    18435040
1833.  018435   041   018435041    18435040
1849.  018482   040   018482040    18482040
1850.  018482   041   018482041    18482040
1854.  018494   020   018494020    18494020
....



--
Lindsay Sabik
Doctoral Candidate in Health Policy
Harvard University
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index