Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Precision in outsheet and outfile
From
"Schaffer, Mark E" <[email protected]>
To
<[email protected]>
Subject
st: Precision in outsheet and outfile
Date
Tue, 11 Jan 2011 13:39:34 -0000
Hi all. I think I've just been bitten by an (almost) undocumented
"feature" of outsheet (and shared by outfile): storage precision is
determined by the display format.
I'm using Stata 11.1 for Windows. Stata 10.1 for Windows behaves the
same way.
For example, in my original data the default display format is %9.0g.
If I change the format of the relevant variable to %12.0f, and then
outsheet and insheet, everything is fine:
. format GDP %12.0f
.
. desc GDP
storage display value
variable name type format label variable label
------------------------------------------------------------
GDP double %12.0f
.
. list
+-----------------+
| Year GDP |
|-----------------|
1. | 1995 9963191 |
2. | 1996 10335489 |
+-----------------+
.
. outsheet using testoutfile.csv, replace comma
.
. insheet using testoutfile.csv, clear case
(2 vars, 2 obs)
.
. list
+-----------------+
| Year GDP |
|-----------------|
1. | 1995 9963191 |
2. | 1996 10335489 |
+-----------------+
But if I don't change the display format, numbers >999,999 lose all but
3 (!!) digits of precision:
. desc GDP
storage display value
variable name type format label variable label
------------------------------------------------------------
GDP double %9.0g
.
. list
+-----------------+
| Year GDP |
|-----------------|
1. | 1995 9963191 |
2. | 1996 1.03e+07 |
+-----------------+
.
. outsheet using testoutfile.csv, replace comma
.
. insheet using testoutfile.csv, clear case
(2 vars, 2 obs)
.
. list
+-----------------+
| Year GDP |
|-----------------|
1. | 1995 9963191 |
2. | 1996 10300000 |
+-----------------+
This behaviour seems to be shared by outfile, even though I'm using
Stata's dictionary to specify the datatype:
. desc GDP
storage display value
variable name type format label variable label
------------------------------------------------------------
GDP double %9.0g
.
. list
+-----------------+
| Year GDP |
|-----------------|
1. | 1995 9963191 |
2. | 1996 1.03e+07 |
+-----------------+
.
. outfile using testoutfile.csv, replace dict
.
. infile using testoutfile.csv, clear
dictionary {
int Year `"Year"'
double GDP
}
(2 observations read)
.
. list
+-----------------+
| Year GDP |
|-----------------|
1. | 1995 9963191 |
2. | 1996 10300000 |
+-----------------+
So even though Stata's dictionary format notes that GDP is a double, all
but 3 digits of precision are lost.
What's happening is that with the default display width of 9 digits,
after 999,999 Stata switches to exponential notation, so it records the
1996 value above as 1.03e+07.
There's no direct mention of this limitation in the documentation for
outsheet. There is something about this in the manual documentation for
outfile, but I had to read between the lines to work out the
implications:
"Numeric variables are output right-justified in the field width
specified by their display format."
The implications for precision follow from this, but I think I can be
forgiven for missing it.
I'm posting to the list because I think it's important enough to bring
to people's attention. If others feel similarly, perhaps StataCorp can
update the online documentation and manual to point this out, or even
add options to outsheet and outfile to control precision independently
of formatting.
--Mark
--
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/