Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: strange behavior of int() function...not truncating properly


From   n j cox <[email protected]>
To   [email protected]
Subject   RE: st: strange behavior of int() function...not truncating properly
Date   Mon, 29 Jan 2007 22:56:21 +0000

This is at root nothing to do with -int()-. The issue
is the expectation that computations to finite precision
are always exact. On the contrary, they almost always
entail approximations and occasionally the results cause
surprise. Also, it is salutary to recall that while you
can do arithmetic with pencil and paper and use base 10
ideas, Stata is using base 2 approximations and quite
different algorithms. The two should come close, but
they are not guaranteed identical.

The format of %9.5f is insufficient to show what is happening.

We, humans who know arithmetic, can see that int(99.14 * 100)/100
should be (is!) 9914 / 100 = 99.14. But Stata does not look at the formula and use its knowledge. It has no knowledge. It is a machine and goes for the best binary approximation it can find. Here is a hexadecimal story

. di %21x int(99.14 * 100)/100
+1.8c8f5c28f5c29X+006

No, it's not transparent to me either, but this is the
closest that users can get to seeing how Stata thinks
of this problem. Here is a decimal representation
of that

. di %21.18f int(99.14 * 100)/100
99.140000000000001000

and with the format used this is acceptable as the
right answer. But that isn't what Zach did that he
found puzzling. By default -generate- produces
float variables and there aren't enough bits in those to
get what Zach sees as being the right answer.

. set obs 1
obs was 0, now 1

. gen v1 = int(99.14 * 100)/100

. di %21.18f v1[1]
99.139999389648437000

This is only a smidgen under 99.14, but the
difference is enough to be noticeable in Zach's
results.

With a -double-, you can reproduce what I did with -display-:

. gen double V1 = int(99.14 * 100)/100

. di %21.18f V1[1]
99.140000000000001000

Otherwise put, 14/100 = 7/50 is an exact decimal, but its binary
representation requires an indefinite number of bits.

The same issue is discussed at
FAQ . . . . . . . . . . . . . . . . . . . Results of the mod(x,y) function
2/03 Why does the mod(x,y) function sometimes give
puzzling results?
Why is mod(0.3,0.1) not equal to 0?
http://www.stata.com/support/faqs/data/mod.html

and in a more recent Mata matters column by William Gould.

Nick
[email protected]

Zachary Harrison

Here is a very simplified example demonstrating how
int() appears to not be properly truncating. What am
I missing here?

. set obs 1
obs was 0, now 1

. gen v1 = 99.1400000

. format v1 %9.5f

. list

+----------+
| v1 |
|----------|
1. | 99.14000 |
+----------+

. gen v2 = int(v1 * 100)/100

. gen v3 = v1 * 100

. replace v3 = int(v3)
(0 real changes made)

. replace v3 = v3 / 100
(1 real change made)

. format v2 %9.5f

. format v3 %9.5f

. list

+--------------------------------+
| v1 v2 v3 |
|--------------------------------|
1. | 99.14000 99.13000 99.14000 |
+--------------------------------+

I realize I can do my truncation in more than 1 step,
as v3 does, but would like to know what is different
here. I also of course realize the effect of
truncating 99.14 to 2 places is no change!

I am using Intercooled Stata 8.2 for Windows.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index