Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: limit to number of digits that can be precisely input into a Stata
From
"Beede, David N" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: limit to number of digits that can be precisely input into a Stata
Date
Tue, 15 May 2012 13:32:15 -0400
Thank you very much, Bill and Nick.
I echo Clyde Schecter's comment on Bill's blog post (I hope it is okay to quote it here):
******
I think the main reason that questions that boil down to precision issues keep popping up on Statalist despite the previously published resources is that there are always new users. And when this problem bites you for the first time, you probably won't think of it as a precision issue, and you won't think to look at those resources. It's hard to "tag" this information in such a way that a new user, naive to precision issues, will find it when searching on his/her own for a solution.
That said, even an "old-timer" like me can get caught on this. I was recently working with a data set in which unique IDs were 10 digit numbers. I had them stored as str10 variables. Unfortunately, because Stata's -sem- command does not support categorical latent variables, I had to do some of my work in MPlus. MPlus reads only fixed or delimted text-files, so I used outfile to create an MPlus data set. I did my work in MPlus and then wanted to bring the results back into Stata. So i just -insheet-ed the MPlus output into a Stata file and -tostring-ed the IDs. I then tried to -merge- back to my other data and was flabbergasted when most of the IDs came up unmatched! I wasted hours trying to see what was wrong with the unmatched IDs and where they were coming from. Finally, it dawned on me that having no particular guidance from me on the type specification for these IDs, -insheet- had stored them as floats and we lost a digit of precision. Changed over to -infile- !
with a direct format specification of -str10- format and everything was fine. (I suppose I could also have stuck with -insheet- and used the -,double- option.)
******
I was very lucky that Census block id numbers are only 15 characters long and that -insheet- saved them as doubles. I would suggest putting a warning into the documentation for -insheet-.
Nick's advice is good - long numeric id numbers should be read in as text!
>
>From "William Gould, StataCorp LP" <[email protected]>
To [email protected]
Subject Re: st: limit to number of digits that can be precisely input into a Stata
Date Tue, 15 May 2012 10:16:13 -0500
________________________________________
>David Beede asked,
>
>> Is 15 the maximum number of digits one can [...] precisely[ ..]
>> input into a Stata dataset using -insheet-?
>
>The maximum value that can ber stored precisely is
>
> 9,007,199,254,740,992
>
>or, without commas,
>
> 9007199254740992
>
>There are 16 digits in the number, but 16-digit numbers greater
>than 9,007,199,254,740,992 are rounded.
>
>As Nick Cox <[email protected]> mentioned, see
>http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/
>
>9*10^15 would be pronounced as 9 quadrillion here in America
>if we could remember how; I had to look it up on Wikipedia.
>
>We computer geeks would be more likely to say the number as
>9 peta-something, as in "Joey has 9 peta-ducks".
>
>I mention this as a way to help everyone remember the limit: a little
>over 9*10^15. Here a "little" is a mere 7,199,254,740,992.
>
>-- Bill
>[email protected]
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/