Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Precision in outsheet and outfile
From
"Schaffer, Mark E" <[email protected]>
To
<[email protected]>
Subject
st: RE: Precision in outsheet and outfile
Date
Mon, 28 Feb 2011 19:03:24 -0000
Hi all. A Statalist reader (but not subscriber) wrote to me about his
problems with -outsheet- and precision, and suggested a work-around,
namely -xmlsave-:
> Dear Mark,
>
> I ran into the same problem with Stata with precision and outsheet
> that you noted in the Statalist. I couldn't figure out how to reply
> to the list, so I am writing directly. The loss of precision and the
> lack of documentation is a major problem in my view.
>
> I was able to save with better precision using the xmlsave command.
>
> Let me know if there are other work-arounds that you hear about.
>
> Regards,
>
> Andrew Austin, Ph.D.
> Congressional Research Service
Andrew's main point is that the behaviour of xmluse/xmlsave is the
behaviour we both expected - but we now know isn't there - from
insheet/outsheet, and that these important differences could benefit
from explicit discussion in the Stata documentation for these commands
(and I agree).
As a follow-up, I am curious to know more about how xmluse/xmlsave
maintain precision. (I realize this is dangerously close to reopening
the double-debate!) It's not discussed in the manual. Does anyone
know?
--Mark
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> Schaffer, Mark E
> Sent: 11 January 2011 13:40
> To: [email protected]
> Subject: st: Precision in outsheet and outfile
>
> Hi all. I think I've just been bitten by an (almost)
> undocumented "feature" of outsheet (and shared by outfile):
> storage precision is determined by the display format.
>
> I'm using Stata 11.1 for Windows. Stata 10.1 for Windows
> behaves the same way.
>
> For example, in my original data the default display format is %9.0g.
> If I change the format of the relevant variable to %12.0f,
> and then outsheet and insheet, everything is fine:
>
> . format GDP %12.0f
>
> .
> . desc GDP
>
> storage display value
> variable name type format label variable label
> ------------------------------------------------------------
> GDP double %12.0f
>
> .
> . list
>
> +-----------------+
> | Year GDP |
> |-----------------|
> 1. | 1995 9963191 |
> 2. | 1996 10335489 |
> +-----------------+
>
> .
> . outsheet using testoutfile.csv, replace comma
>
> .
> . insheet using testoutfile.csv, clear case
> (2 vars, 2 obs)
>
> .
> . list
>
> +-----------------+
> | Year GDP |
> |-----------------|
> 1. | 1995 9963191 |
> 2. | 1996 10335489 |
> +-----------------+
>
> But if I don't change the display format, numbers >999,999
> lose all but
> 3 (!!) digits of precision:
>
> . desc GDP
>
> storage display value
> variable name type format label variable label
> ------------------------------------------------------------
> GDP double %9.0g
>
> .
> . list
>
> +-----------------+
> | Year GDP |
> |-----------------|
> 1. | 1995 9963191 |
> 2. | 1996 1.03e+07 |
> +-----------------+
>
> .
> . outsheet using testoutfile.csv, replace comma
>
> .
> . insheet using testoutfile.csv, clear case
> (2 vars, 2 obs)
>
> .
> . list
>
> +-----------------+
> | Year GDP |
> |-----------------|
> 1. | 1995 9963191 |
> 2. | 1996 10300000 |
> +-----------------+
>
>
> This behaviour seems to be shared by outfile, even though I'm
> using Stata's dictionary to specify the datatype:
>
> . desc GDP
>
> storage display value
> variable name type format label variable label
> ------------------------------------------------------------
> GDP double %9.0g
>
> .
> . list
>
> +-----------------+
> | Year GDP |
> |-----------------|
> 1. | 1995 9963191 |
> 2. | 1996 1.03e+07 |
> +-----------------+
>
> .
> . outfile using testoutfile.csv, replace dict
>
> .
> . infile using testoutfile.csv, clear
>
> dictionary {
> int Year `"Year"'
> double GDP
> }
>
> (2 observations read)
>
> .
> . list
>
> +-----------------+
> | Year GDP |
> |-----------------|
> 1. | 1995 9963191 |
> 2. | 1996 10300000 |
> +-----------------+
>
> So even though Stata's dictionary format notes that GDP is a
> double, all but 3 digits of precision are lost.
>
> What's happening is that with the default display width of 9
> digits, after 999,999 Stata switches to exponential notation,
> so it records the
> 1996 value above as 1.03e+07.
>
> There's no direct mention of this limitation in the
> documentation for outsheet. There is something about this in
> the manual documentation for outfile, but I had to read
> between the lines to work out the
> implications:
>
> "Numeric variables are output right-justified in the field
> width specified by their display format."
>
> The implications for precision follow from this, but I think
> I can be forgiven for missing it.
>
> I'm posting to the list because I think it's important enough
> to bring to people's attention. If others feel similarly,
> perhaps StataCorp can update the online documentation and
> manual to point this out, or even add options to outsheet and
> outfile to control precision independently of formatting.
>
> --Mark
>
>
> --
> Heriot-Watt University is a Scottish charity registered under
> charity number SC000278.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/