This difficulty is often flagged on this list. For example, see the thread that started only a few weeks ago with
<http://www.stata.com/statalist/archive/2009-09/msg00927.html>
This thread is especially relevant as the underlying problem is the same, handling composite identifiers formed by concatenation.
A more general point is that one can miss valuable stuff by treating Statalist as write-only.
I don't fully agree with Stas about the default. Setting -double- as the default avoids certain problems only to create others, notably inefficiency and storage. Stas is of course perfectly at liberty to change the default for his purposes, but that doesn't make -double- necessarily a good default for all users.
Nick
[email protected]
Stas Kolenikov
Read on -help datatypes- to figure out the relative accuracy of the
stored numbers. The default -float- type (which is a terrible default
if you ask me) stores numbers with about 4e-8 relative accuracy. Your
multiplication by 1e7 produces results accurate to the -state- level
only; your localities are well below the round-off error for this
type. I have forgotten about these troubles ages ago after putting a
line
set type double
into my profile.do file in Stata directory.
On the other hand, a -double- type variable still makes a rather poor
identifier, so you might want to -generate- your compound ID variables
as -long-:
gen long claveloc =((state*10000000)+ (mun2*10000))+loc2
Again, make sure you are still able to store all the numbers
accurately, and the largest ID you could ever need does not exceed
~2bln:
. di %12.0g c(maxlong)
2147483620
If you have fewer than 214 states, you should be good to go with -long- :))
On Fri, Oct 16, 2009 at 9:27 PM, Kanter, Rebecca <[email protected]> wrote:
I am trying to add three numbers (1-2 digit code for state plus 3 digit code for municipality + 4 digit code for locality). together unique for each state in a country. I have tried this various ways and each time, after the 1st state, STATA starts to round (I think) some of the numbers. I have tried this numerous ways. No state, municipality, or locality are missing. State is byte. Municipality and locality are strings (that I convert to numeric see below).
>
> gen munloc=mun+loc
> destring munloc, generate(test)
> generate ent2=ent*10000000
> generate claveloc=ent2+test
>
> or whereby:
> mun2=real(mun)
> loc2=real(loc)
> gen claveloc =((state*10000000)+ (mun2*10000))+loc2
>
> ****Anyway I try this, I get problems like this:
>
> state mun loc munloc test state2 claveloc
> 2 001 0001 0010001 10001 20000000 20010000 (should be 20010001)
> ent mun loc munloc test state2 claveloc
> 2 001 0139 0010139 10139 20000000 20010140 (should be 20010139)
>
> *Should be like this:
>
> state mun loc munloc test state2 claveloc
> 1 001 0001 0010001 10001 10000000 10010001
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/