[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: a question about number precision

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: st: a question about number precision
Date	Fri, 31 Mar 2006 18:15:33 -0600

On Mar 31, 2006, at 4:33 PM, Jian Zhang wrote:

I have a problem about number precision. I cann't figure out what it happened. Hope that you can help me out. Thanks.

Here is the data:
ID
21557127

then i run the following do file trying to extract the last three digits from the ID:

gen double temxxx=(ID/1000)
gen temyyy=int(temxxx)
gen temzzz=temxxx-temyyy
gen areaxxx=(temzzz*1000)
drop temxxx temyyy temzzz

the generated data looks like the following:
ID areaxxx
21557127 127

However, when I typed: list if areaxxx==127, stata in fact listed nothing!

First I thought it may be because areaxxx is a floating-point variable, so I type: list if areaxxx=float(127). However, Stata listed nothing again.

First, let me say that if all you want to do is to extract the last three digits of the ID, here is the way to do it:

. di real(substr(string(ID,"%12.0g"),-3,.))
127

Note that if you just use string(ID) this will not work, as string() uses a default format which is not wide enough for your ID (%12.0g is the default format for the long storage type, which I presume is how your ID variable is stored).

Second, this is exactly the reason why you should not store IDs as numbers -- you should store them as strings instead. For example, if ID were a string variable, then extracting the last three digits would be even simpler:

. di substr(ID,-3,.)
127

and would be guaranteed to work no matter how long your IDs are (provided they are no longer than 244 characters).

Finally, what happened above? The problem was indeed due to the error inherent in floating-point arithmetic. For example, here is the calculation you performed:

. di %24.18f float( 1000 * float( (21557127/1000) - float( int (21557127/1000) ) ) )
127.000007629394531250

which, as you can see is not equal to 127. Let's take a closer look:

float( 1000 * float( (21557127/1000) - float( int (21557127/1000) ) ) )

---- temxxx --- ---- temxxx ---

--------- temyyy ----------

--------------------- temzzz ------------------------

-------------------------------- areaxxx -----------------------------

Notice how I am using the float() function to mimic the fact that, although you created temxxx as a double, you did not do so for the other intermediate variables. Now in this case, had you also created temzzz as a double, you would have gotten what you wanted:

. assert float( 1000 * ( (21557127/1000) - float( int (21557127/1000) ) ) ) == 127

However, as I said above, it is nearly always better to store IDs such as these as string variables.

On Mar 31, 2006, at 4:44 PM, Alex Ogan wrote:

Here's something weird and probably related:

. clear

. set obs 1
obs was 0, now 1

. gen ID = 21557127

. display ID
21557128

Not weird at all. By default, -generate- creates new variables using the float storage type. And rounded to float precision, 21557127 is 21557128:

. di float(21557127)
21557128

Had you instead created the variable as a long, you would have seen what you expected:

. clear

. set obs 1
obs was 0, now 1

. gen long ID = 21557127

. di ID
21557127

-- Phil

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

References:
- st: a question about number precision
  - From: "Jian Zhang" <[email protected]>

Prev by Date: Re: st: Hausman Test & Interaction terms
Next by Date: RE: st: Spatial lag models
Previous by thread: st: a question about number precision
Index(es):
- Date
- Thread