Since precision came up again, I would like to add a comment:
While it is good practice to try to minimize rounding errors during computation (e.g. during computing new variables that are sums), you should keep in mind how precise your measurement on that variable actually is. For instance, I teach introductory statistics to first year social science students. In the Netherlands grades run from 0 (didn't even spell their own name right) to 10 (brilliant). Each year at least one of them asks whether I would want to give them grades with two decimal points accuracy. I think I make good exams, but they cannot distinguish between a student with a statistics capability worth a 6.01 and worth a 6.02.
Eight digits accurate (float) should be more than enough for most measurements; in most real data I would consider sixteen digits (double) overkill.
HTH,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
-----Original Message-----
From: [email protected] [mailto:[email protected]]On Behalf Of Maarten Buis
Sent: maandag 28 november 2005 16:57
To: [email protected]
Subject: st: RE: RE: generating a new variable with the egen command
Furthermore, calculating the sums in float format isn't as bad as it initially seems: the fact that Stata shows only 3 digits of precision has to do with the display format. The float is precise up to 8 digits, and is stored that way, as can be seen in the program below.
*-------------------------example------------------
drop _all
input x
42224464
67090781
end
egen sum = sum(x)
egen long sumlong = sum(x)
list sum sumlong
format sum %20.7g
list sum sumlong
*------------end example-----------------
HTH,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
-----Original Message-----
From: [email protected] [mailto:[email protected]]On Behalf Of Nick Cox
Sent: maandag 28 november 2005 16:42
To: [email protected]
Subject: st: RE: generating a new variable with the egen command
Your call should be
egen double newvar = ...
or
egen long newvar = ...
If these are integers, -long- is better. Either type
is allowed by the syntax. See the help for -egen- once more.
Nick
[email protected]
>[email protected]
>
> I'm trying to generate a new variable given by the sum of an
> existing variable
> by year (as I have 2 six month observations for the same
> year) and individual.
> The command is the following:
> egen newvar=sum (oldvar), by(id year)
>
> the problem is that the original values are, for instance:
> 42224464 and
> 67090781, but the sum that STATA computes is 109000000,
> instead of 109315245.
> Why does it round like that?
> The old variable is stored as "double", I have tried to
> change the storage
> type, using recast, but STATA does not allow to do it, as
> many values would be
> changed...
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/