Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: A bug in egen and gen?


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: A bug in egen and gen?
Date   Thu, 17 Feb 2011 19:04:29 +0000

In practice, if StataCorp always warned you of everything that could
bite you, the help would be much, much longer.

Your last suggestion would typically leave -double-s in place unless
it so happened that the result was integers in every observation. To
see why, study Bill Gould's recent postings on the StataCorp blog.
That would on average nearly double your storage. If you don't mind,
you might as follow Stas' suggestion and -set type double-.

Spelling is Stata, not STATA.

Nick

On Thu, Feb 17, 2011 at 6:54 PM, Liao, Junlin <junlin-liao@uiowa.edu> wrote:

> I appreciate the quick reply and it's quite informative. I thought it's a bug but was not sure. Now I know if I use -generate- or -egen- command, I have to set type of numeric variables.
>
> I have no desire for a perfect STATA, however, I do see where STATA could make it better. A warning about default data type for numeric variables in respective commands help window could be helpful. Better yet, with increasing computing power, those commands should perform their calculations with the highest accuracy type and then perform a -compress- type operation to finalize the new variables. Just a thought.

Nick Cox

> This isn't a bug.
>
> It may well bite you, but a better description is that you get what you ask for.
>
> The default default [intended] type for new numeric variables is -float-. As well documented, -float- can not hold sufficiently large integers accurately, which is precisely why -long- and -double- are available as alternatives. As well documented, both commands allow you to depart from the default.
>
> So Junlin already documented what is better practice, that you spell out that you want a -long-.
>
> The alternatives include
>
> 1. You always have to tell -generate-, etc. what variable type you want created. On the whole, I don't think that would be a popular change.
>
> 2. You write your own wrappers for -generate-, etc. that insist on variable type being specified. That is programmable.
>
> 3. Your attitude is that Stata should always be smart enough to work out what you want. Good luck on that one.
>
> Nick
>
> On Thu, Feb 17, 2011 at 6:09 PM, Liao, Junlin <junlin-liao@uiowa.edu> wrote:
>
>> I happen to notice a problem with the egen and gen commands. I'm using Stata 2011 SE (64 bit). I do not know if this problem exists in other versions. Please run the following commands to reproduce the problem:
>>
>> clear
>> set obs 5
>> generate y=83085733
>> generate long z=83085733
>> egen y_mean=mean(y)
>> egen z_mean=mean(z)
>> egen long y_mean_long=mean(y)
>> egen long z_mean_long=mean(z)
>> format %10.0g y z  y_mean z_mean y_mean_long z_mean_long list
>>
>> By default, both egen and gen command use float for the size of the number and the values generated are not correct. However, if we restrict the numbers to be long integer, we can get correct results.
>>
>> Anybody else noticed the bug? Is there an explanation for what
>> happens? Thanks,

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index