Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: A bug in egen and gen? |
Date | Thu, 17 Feb 2011 19:04:29 +0000 |
In practice, if StataCorp always warned you of everything that could bite you, the help would be much, much longer. Your last suggestion would typically leave -double-s in place unless it so happened that the result was integers in every observation. To see why, study Bill Gould's recent postings on the StataCorp blog. That would on average nearly double your storage. If you don't mind, you might as follow Stas' suggestion and -set type double-. Spelling is Stata, not STATA. Nick On Thu, Feb 17, 2011 at 6:54 PM, Liao, Junlin <junlin-liao@uiowa.edu> wrote: > I appreciate the quick reply and it's quite informative. I thought it's a bug but was not sure. Now I know if I use -generate- or -egen- command, I have to set type of numeric variables. > > I have no desire for a perfect STATA, however, I do see where STATA could make it better. A warning about default data type for numeric variables in respective commands help window could be helpful. Better yet, with increasing computing power, those commands should perform their calculations with the highest accuracy type and then perform a -compress- type operation to finalize the new variables. Just a thought. Nick Cox > This isn't a bug. > > It may well bite you, but a better description is that you get what you ask for. > > The default default [intended] type for new numeric variables is -float-. As well documented, -float- can not hold sufficiently large integers accurately, which is precisely why -long- and -double- are available as alternatives. As well documented, both commands allow you to depart from the default. > > So Junlin already documented what is better practice, that you spell out that you want a -long-. > > The alternatives include > > 1. You always have to tell -generate-, etc. what variable type you want created. On the whole, I don't think that would be a popular change. > > 2. You write your own wrappers for -generate-, etc. that insist on variable type being specified. That is programmable. > > 3. Your attitude is that Stata should always be smart enough to work out what you want. Good luck on that one. > > Nick > > On Thu, Feb 17, 2011 at 6:09 PM, Liao, Junlin <junlin-liao@uiowa.edu> wrote: > >> I happen to notice a problem with the egen and gen commands. I'm using Stata 2011 SE (64 bit). I do not know if this problem exists in other versions. Please run the following commands to reproduce the problem: >> >> clear >> set obs 5 >> generate y=83085733 >> generate long z=83085733 >> egen y_mean=mean(y) >> egen z_mean=mean(z) >> egen long y_mean_long=mean(y) >> egen long z_mean_long=mean(z) >> format %10.0g y z y_mean z_mean y_mean_long z_mean_long list >> >> By default, both egen and gen command use float for the size of the number and the values generated are not correct. However, if we restrict the numbers to be long integer, we can get correct results. >> >> Anybody else noticed the bug? Is there an explanation for what >> happens? Thanks, * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/