Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: A bug in egen and gen?
From
"Sarah Edgington" <[email protected]>
To
<[email protected]>
Subject
RE: st: A bug in egen and gen?
Date
Thu, 17 Feb 2011 12:00:35 -0800
Junlin,
Have you tried this experiment with something other than a large integer? I
think Nick's point was that you only regain the space using compress if
you're dealing exclusively with integers. You've demonstrated that
compressing a double variable in the case where all observations are
integers gets you the same storage size as if you'd started out in float.
Is the same true if all observations are not integers?
-Sarah
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Liao, Junlin
Sent: Thursday, February 17, 2011 11:53 AM
To: [email protected]
Subject: RE: st: A bug in egen and gen?
Nick,
I had experimented with Stata in terms of storage. Here are the results:
I generate one variable with a single value of 83085733 for 1000,000 times.
The different sizes are
Original file
after -compress-
Float 3907KB
3907KB (Long)
Double 7813KB
3907KB (Long)
I can see that if the variable is of type double, it requires twice as much
storage space comparing to float. The storage space for float is as much as
for long. There is no difference after the files are compressed to the final
appropriate data type. Therefore, my recommendation for Stata to use double
as default calculation and finally select the appropriate type to store data
is sensible.
It is also desirable to simply set type to double and compress whenever
saving data.
Thanks,
Junlin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Liao, Junlin
Sent: Thursday, February 17, 2011 1:29 PM
To: [email protected]
Subject: RE: st: A bug in egen and gen?
I'm confused here now. Isn't the type of variables determines storage
spaces? I'll do some experiments to see your point here. Your attention and
quick responses are greatly appreciated.
Junlin
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, February 17, 2011 1:23 PM
To: [email protected]
Subject: Re: st: A bug in egen and gen?
You're saying, in effect, that nearly doubling storage would not typically
bite users. That will be true in some cases but not all.
There is no need to wonder. The help for -save- says there is no such
option. But -compress- before -save- is naturally a very good choice and you
could program your own wrapper for -save- that always did it.
Here is a sketch:
program jlsave
version 8
compress
save `0'
end
But if you do what you just said you wanted to do, -set type double-, using
-compress- is not going to give you back more than a fraction of the extra
storage you spend. The fraction will depend on how much you deal with
strings, always integer variables, etc.
On Thu, Feb 17, 2011 at 7:11 PM, Liao, Junlin <[email protected]> wrote:
Storage wouldn't be a problem if we perform -compress- command regularly.
I'm wondering if Stata can let you select an option whenever it saves data,
it compresses. It will surely be handy to solve this problem.
Nick Cox
> In practice, if StataCorp always warned you of everything that could bite
you, the help would be much, much longer.
>
> Your last suggestion would typically leave -double-s in place unless it so
happened that the result was integers in every observation. To see why,
study Bill Gould's recent postings on the StataCorp blog.
> That would on average nearly double your storage. If you don't mind, you
might as follow Stas' suggestion and -set type double-.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential
and may be legally privileged. If you are not the intended recipient, you
are hereby notified that any retention, dissemination, distribution, or
copying of this communication is strictly prohibited. Please reply to the
sender that you have received the message in error, then delete it. Thank
you.
________________________________
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential
and may be legally privileged. If you are not the intended recipient, you
are hereby notified that any retention, dissemination, distribution, or
copying of this communication is strictly prohibited. Please reply to the
sender that you have received the message in error, then delete it. Thank
you.
________________________________
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/