Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Precision in outsheet and outfile

From	"Schaffer, Mark E" <[email protected]>
To	<[email protected]>
Subject	st: RE: Precision in outsheet and outfile
Date	Mon, 28 Feb 2011 19:03:24 -0000

Hi all.  A Statalist reader (but not subscriber) wrote to me about his
problems with -outsheet- and precision, and suggested a work-around,
namely -xmlsave-:

> Dear Mark,
> 
>   I ran into the same problem with Stata with precision and outsheet
> that you noted in the Statalist.  I couldn't figure out how to reply
> to the list, so I am writing directly.  The loss of precision and the
> lack of documentation is a major problem in my view.
> 
>    I was able to save with better precision using the xmlsave command.
>
>    Let me know if there are other work-arounds that you hear about.
> 
> Regards,
> 
> Andrew Austin, Ph.D.
> Congressional Research Service

Andrew's main point is that the behaviour of xmluse/xmlsave is the
behaviour we both expected - but we now know isn't there - from
insheet/outsheet, and that these important differences could benefit
from explicit discussion in the Stata documentation for these commands
(and I agree).

As a follow-up, I am curious to know more about how xmluse/xmlsave
maintain precision.  (I realize this is dangerously close to reopening
the double-debate!)  It's not discussed in the manual.  Does anyone
know?

--Mark

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Schaffer, Mark E
> Sent: 11 January 2011 13:40
> To: [email protected]
> Subject: st: Precision in outsheet and outfile
> 
> Hi all.  I think I've just been bitten by an (almost) 
> undocumented "feature" of outsheet (and shared by outfile): 
> storage precision is determined by the display format.
> 
> I'm using Stata 11.1 for Windows.  Stata 10.1 for Windows 
> behaves the same way.
> 
> For example, in my original data the default display format is %9.0g.
> If I change the format of the relevant variable to %12.0f, 
> and then outsheet and insheet, everything is fine:
> 
> . format GDP %12.0f
> 
> . 
> . desc GDP
> 
>               storage  display     value
> variable name   type   format      label      variable label
> ------------------------------------------------------------
> GDP             double %12.0f                 
> 
> . 
> . list
> 
>      +-----------------+
>      | Year        GDP |
>      |-----------------|
>   1. | 1995    9963191 |
>   2. | 1996   10335489 |
>      +-----------------+
> 
> . 
> . outsheet using testoutfile.csv, replace comma
> 
> . 
> . insheet using testoutfile.csv, clear case
> (2 vars, 2 obs)
> 
> . 
> . list
> 
>      +-----------------+
>      | Year        GDP |
>      |-----------------|
>   1. | 1995    9963191 |
>   2. | 1996   10335489 |
>      +-----------------+
> 
> But if I don't change the display format, numbers >999,999 
> lose all but
> 3 (!!) digits of precision:
> 
> . desc GDP
> 
>               storage  display     value
> variable name   type   format      label      variable label
> ------------------------------------------------------------
> GDP             double %9.0g                  
> 
> . 
> . list
> 
>      +-----------------+
>      | Year        GDP |
>      |-----------------|
>   1. | 1995    9963191 |
>   2. | 1996   1.03e+07 |
>      +-----------------+
> 
> . 
> . outsheet using testoutfile.csv, replace comma
> 
> . 
> . insheet using testoutfile.csv, clear case
> (2 vars, 2 obs)
> 
> . 
> . list
> 
>      +-----------------+
>      | Year        GDP |
>      |-----------------|
>   1. | 1995    9963191 |
>   2. | 1996   10300000 |
>      +-----------------+
> 
> 
> This behaviour seems to be shared by outfile, even though I'm 
> using Stata's dictionary to specify the datatype:
> 
> . desc GDP
> 
>               storage  display     value
> variable name   type   format      label      variable label
> ------------------------------------------------------------
> GDP             double %9.0g                  
> 
> . 
> . list
> 
>      +-----------------+
>      | Year        GDP |
>      |-----------------|
>   1. | 1995    9963191 |
>   2. | 1996   1.03e+07 |
>      +-----------------+
> 
> . 
> . outfile using testoutfile.csv, replace dict
> 
> . 
> . infile using testoutfile.csv, clear
> 
> dictionary {
>         int    Year              `"Year"'
>         double GDP
> }
> 
> (2 observations read)
> 
> . 
> . list
> 
>      +-----------------+
>      | Year        GDP |
>      |-----------------|
>   1. | 1995    9963191 |
>   2. | 1996   10300000 |
>      +-----------------+
> 
> So even though Stata's dictionary format notes that GDP is a 
> double, all but 3 digits of precision are lost.
> 
> What's happening is that with the default display width of 9 
> digits, after 999,999 Stata switches to exponential notation, 
> so it records the
> 1996 value above as 1.03e+07.
> 
> There's no direct mention of this limitation in the 
> documentation for outsheet.  There is something about this in 
> the manual documentation for outfile, but I had to read 
> between the lines to work out the
> implications:
> 
> "Numeric variables are output right-justified in the field 
> width specified by their display format."
> 
> The implications for precision follow from this, but I think 
> I can be forgiven for missing it.
> 
> I'm posting to the list because I think it's important enough 
> to bring to people's attention.  If others feel similarly, 
> perhaps StataCorp can update the online documentation and 
> manual to point this out, or even add options to outsheet and 
> outfile to control precision independently of formatting.
> 
> --Mark
> 
> 
> --
> Heriot-Watt University is a Scottish charity registered under 
> charity number SC000278.
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Category sizes in IF statements - how?
Next by Date: RE: st: technical question on the removal of outliers
Previous by thread: st: Category sizes in IF statements - how?
Next by thread: st: Collapsing data to daily data
Index(es):
- Date
- Thread