Jean Marie Linhart
>
> Nick Cox <[email protected]> wrote:
>
> > 1. To keep every digit in a numeric identifier that is
> interpretable
> > as an integer, use -long- not -double-. The very large
> numbers which
> > can be approximately be held in a -double- obscure the fact that
> > even 8-digit integers cannot all be held exactly, giving rise to
> > anomalies such as those you experienced.
>
> I think Nick mistyped here. He meant that doubles cannot hold 16
> digit integers. They do just fine with 8 digit integers.
>
> Why is this?
>
> If I can explain this coherently and without any typos, IEEE double
> precision numbers have 64 total bits (binary digits) broken
> down into:
> 1 bit for the sign, 11 bits for the binary exponent and 52 bits for
> the binary fraction. It is the binary fraction that determines the
> precision. The binary fraction is intended to represent a binary
> number between 1 and 2, i.e., there is an assumed 1 at the front, we
> really have 1.F where F is the fractional part that is stored in the
> 52 bits. Any nonzero number can be written this way by choosing the
> correct exponent. This gives us a precision of 1/2^53.
> Since 1e-15 >
> 1/2^53 > 1e-16, this means we expect to get 15 digits. Sometimes we
> will get 16, but not always.
>
> For more information, you may like to see:
>
> http://www.scri.fsu.edu/~jac/MAD3401/Backgrnd/ieee.html
>
> Or do web searches on "IEEE floating point"
Thanks for the correction and detailed analysis. FWIW, I was picking
up on
Ann Flanagan's original report
> I have a set of data with a string variable 13 characters in length,
> containing a unique school district identifier -- the first
> eight characters
> some of which have a leading zero. The remaining five
> characters identify
> the schools within the districts. I need the district
> identifier to be
> "real" for collapsing the data to the district level.
> Here's what I do
>
> gen str8 district=substr(rcds,1,8)
> gen double dno =real(district)
> format dno %08.0f
>
> When I list the data and/or run -xtgee- on the dataset,
> there are rounding
> errors such that:
>
> rcds==4000704000001
> rcds==4000704200001
>
> both return a district number of 40007040 and I lose districts in
> estimation.
That is, Ann reported that real("40007040") and real("40007042")
are both held as 40007040 in a double. However, a check confirms
Jean-Marie's analysis: this is not true, so there is a small
puzzle remaining here.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/