| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: RE: st: behavior of -areg-
My response to the statalist requires a rebuttal on my part. The promotion
from single to double precision is done in binary so my demonstration is
misleading in that I listed the promoted variables in base 10. If you view the
variables in hexidecimal you can see that variable fvalue generated by
. gen double fvalue = mvalue
is really padded in zeros, but the dvalue generated by
. gen double dvalue = round(mvalue,10^(digits+ceil(log10(abs(mvalue)))))
is not.
. format %21x mvalue fvalue dvalue
. list fvalue mvalue dvalue in 1/20
+-----------------------------------------------------------------------+
| fvalue mvalue dvalue |
|-----------------------------------------------------------------------|
1. | +1.80d0000000000X+00b +1.80d0000000000X+00b +1.80d0000000001X+00b |
2. | +1.235b340000000X+00c +1.235b340000000X+00c +1.235b333333334X+00c |
3. | +1.50b19a0000000X+00c +1.50b19a0000000X+00c +1.50b199999999bX+00c |
4. | +1.5d06660000000X+00b +1.5d06660000000X+00b +1.5d06666666668X+00b |
5. | +1.0d93340000000X+00c +1.0d93340000000X+00c +1.0d93333333334X+00c |
|-----------------------------------------------------------------------|
6. | +1.223e660000000X+00c +1.223e660000000X+00c +1.223e666666667X+00c |
7. | +1.1c73340000000X+00c +1.1c73340000000X+00c +1.1c73333333334X+00c |
8. | +1.9583340000000X+00b +1.9583340000000X+00b +1.9583333333335X+00b |
9. | +1.fab6660000000X+00b +1.fab6660000000X+00b +1.fab6666666668X+00b |
10. | +1.11b4cc0000000X+00c +1.11b4cc0000000X+00c +1.11b4cccccccceX+00c |
|-----------------------------------------------------------------------|
11. | +1.2e8e660000000X+00c +1.2e8e660000000X+00c +1.2e8e666666667X+00c |
12. | +1.324e660000000X+00c +1.324e660000000X+00c +1.324e666666667X+00c |
13. | +1.b8d0000000000X+00b +1.b8d0000000000X+00b +1.b8d0000000002X+00b |
14. | +1.96d6660000000X+00b +1.96d6660000000X+00b +1.96d6666666668X+00b |
15. | +1.ce86660000000X+00b +1.ce86660000000X+00b +1.ce86666666668X+00b |
|-----------------------------------------------------------------------|
16. | +1.d573340000000X+00b +1.d573340000000X+00b +1.d573333333335X+00b |
17. | +1.2e10000000000X+00c +1.2e10000000000X+00c +1.2e10000000001X+00c |
18. | +1.33ce660000000X+00c +1.33ce660000000X+00c +1.33ce666666667X+00c |
19. | +1.861b340000000X+00c +1.861b340000000X+00c +1.861b333333335X+00c |
20. | +1.5d999a0000000X+00c +1.5d999a0000000X+00c +1.5d9999999999bX+00c |
+-----------------------------------------------------------------------+
> I would like to add to this demonstration where I show the
> dangers of having single precision variables (float) and not
> promoting them to double precision with caution.
>
> Note that single precision will have only about 7 digits
> of precision while double has about 16 corresponding to
> 24 and 53 bits for the mantissia:
>
> . di c(epsfloat)
> 1.192e-07
>
> . di c(epsdouble)
> 2.220e-16
>
> . di 2^(1-24)
> 1.192e-07
>
> . di 2^(1-53)
> 2.220e-16
>
> machine epsilon is 2^(1-p) where p is the number of bits
> reserved for the mantissia of floating point numbers.
>
> Now take a look at the storage type for each of the variables
> in the grunfeld dataset.
>
> . webuse grunfeld,clear
>
> . describe
>
> Contains data from http://www.stata-press.com/data/r9/grunfeld.dta
> obs: 200
> vars: 6 3 Mar 2005 20:27
> size: 5,600 (99.5% of memory free)
> -------------------------------------------------------------------------------
> storage display value
> variable name type format label variable label
> -------------------------------------------------------------------------------
> company float %9.0g
> year float %9.0g
> invest float %9.0g
> mvalue float %9.0g
> kstock float %9.0g
> time float %9.0g
>
> . list in 1/20
>
> +--------------------------------------------------+
> | company year invest mvalue kstock time |
> |--------------------------------------------------|
> 1. | 1 1935 317.6 3078.5 2.8 1 |
> 2. | 1 1936 391.8 4661.7 52.6 2 |
> 3. | 1 1937 410.6 5387.1 156.9 3 |
> 4. | 1 1938 257.7 2792.2 209.2 4 |
> 5. | 1 1939 330.8 4313.2 203.4 5 |
> |--------------------------------------------------|
> 6. | 1 1940 461.2 4643.9 207.2 6 |
> 7. | 1 1941 512 4551.2 255.2 7 |
> 8. | 1 1942 448 3244.1 303.7 8 |
> 9. | 1 1943 499.6 4053.7 264.1 9 |
> 10. | 1 1944 547.5 4379.3 201.6 10 |
> |--------------------------------------------------|
> 11. | 1 1945 561.2 4840.9 265 11 |
> 12. | 1 1946 688.1 4900.9 402.2 12 |
> 13. | 1 1947 568.9 3526.5 761.5 13 |
> 14. | 1 1948 529.2 3254.7 922.4 14 |
> 15. | 1 1949 555.1 3700.2 1020.1 15 |
> |--------------------------------------------------|
> 16. | 1 1950 642.9 3755.6 1099 16 |
> 17. | 1 1951 755.9 4833 1207.7 17 |
> 18. | 1 1952 891.2 4924.9 1430.5 18 |
> 19. | 1 1953 1304.4 6241.7 1777.3 19 |
> 20. | 1 1954 1486.7 5593.6 2226.3 20 |
> +--------------------------------------------------+
>
> We can extend the display format of mvalue and kstock and you will see
> what you are going to get beyond the 7-th digit when you promote these
> variables to double precision.
>
> . format %20.15g kstock mvalue
>
> . gen double fvalue = mvalue
>
> . gen double fstock = kstock
>
> . format %20.15g fvalue fstock
>
> . list fvalue mvalue fstock kstock in 1/20
>
> +---------------------------------------------------------------------------+
> | fvalue mvalue fstock kstock | |---------------------------------------------------------------------------|
> 1.| 3078.5 3078.5 2.79999995231628 2.79999995231628 |
> 2.| 4661.7001953125 4661.7001953125 52.5999984741211 52.5999984741211 |
> 3.| 5387.10009765625 5387.10009765625 156.899993896484 156.899993896484 |
> 4.| 2792.19995117188 2792.19995117188 209.199996948242 209.199996948242 |
> 5.| 4313.2001953125 4313.2001953125 203.399993896484 203.399993896484 |
> |---------------------------------------------------------------------------|
> 6.| 4643.89990234375 4643.89990234375 207.199996948242 207.199996948242 |
> 7.| 4551.2001953125 4551.2001953125 255.199996948242 255.199996948242 |
> 8.| 3244.10009765625 3244.10009765625 303.700012207031 303.700012207031 |
> 9.| 4053.69995117188 4053.69995117188 264.100006103516 264.100006103516 |
> 10.| 4379.2998046875 4379.2998046875 201.600006103516 201.600006103516 |
> |---------------------------------------------------------------------------|
> 11 | 4840.89990234375 4840.89990234375 265 265 |
> 12.| 4900.89990234375 4900.89990234375 402.200012207031 402.200012207031 |
> 13.| 3526.5 3526.5 761.5 761.5 |
> 14.| 3254.69995117188 3254.69995117188 922.400024414063 922.400024414063 |
> 15.| 3700.19995117188 3700.19995117188 1020.09997558594 1020.09997558594 |
> |---------------------------------------------------------------------------|
> 16.| 3755.60009765625 3755.60009765625 1099 1099 |
> 17.| 4833 4833 1207.69995117188 1207.69995117188 |
> 18.| 4924.89990234375 4924.89990234375 1430.5 1430.5 |
> 19.| 6241.7001953125 6241.7001953125 1777.30004882813 1777.30004882813 |
> 20.| 5593.60009765625 5593.60009765625 2226.30004882813 2226.30004882813 |
> +---------------------------------------------------------------------------+
>
>
> . egen double t1 = total(mva), by(com)
>
> . egen double t2 = total(kstock), by(com)
>
> . format %20.15g t1 t2
>
> . list t1 t2 in 1/20
>
> +------------------------------------+
> | t1 t2 |
> |------------------------------------|
> 1. | 86676.900390625 12968.7000625134 |
> 2. | 86676.900390625 12968.7000625134 |
> 3. | 86676.900390625 12968.7000625134 |
> 4. | 86676.900390625 12968.7000625134 |
> 5. | 86676.900390625 12968.7000625134 |
> |------------------------------------|
> 6. | 86676.900390625 12968.7000625134 |
> 7. | 86676.900390625 12968.7000625134 |
> 8. | 86676.900390625 12968.7000625134 |
> 9. | 86676.900390625 12968.7000625134 |
> 10. | 86676.900390625 12968.7000625134 |
> |------------------------------------|
> 11. | 86676.900390625 12968.7000625134 |
> 12. | 86676.900390625 12968.7000625134 |
> 13. | 86676.900390625 12968.7000625134 |
> 14. | 86676.900390625 12968.7000625134 |
> 15. | 86676.900390625 12968.7000625134 |
> |------------------------------------|
> 16. | 86676.900390625 12968.7000625134 |
> 17. | 86676.900390625 12968.7000625134 |
> 18. | 86676.900390625 12968.7000625134 |
> 19. | 86676.900390625 12968.7000625134 |
> 20. | 86676.900390625 12968.7000625134 |
> +------------------------------------+
>
> You can see here that even though we have made the variables
> t1 and t2 double precision, we really only have done single
> precision numerics (actually a little better than single
> precision) and the variables have only 9 digits that
> are any good. We gained some precsion, but we did not
> do as well as we would have if we started with full double
> precision. This garbage beyond the 9-th digit can be
> enough for -_rmcoll- to not detect variable collinearity.
>
> Now if we promote mvalue and kstock to double carefully
> we can get full precision.
>
> . scalar digits = round(log10(abs(c(epsfloat))),1)+1
>
> . gen double dvalue = round(mvalue,10^(digits+ceil(log10(abs(mvalue)))))
>
> . gen double dstock = round(kstock,10^(digits+ceil(log10(abs(kstock)))))
>
> . format %20.15g dvalue dstock
>
> . list dvalue mvalue dstock kstock in 1/20
>
> +-------------------------------------------------------+
> | dvalue mvalue dstock kstock |
> |-------------------------------------------------------|
> 1. | 3078.5 3078.5 2.8 2.79999995231628 |
> 2. | 4661.7 4661.7001953125 52.6 52.5999984741211 |
> 3. | 5387.1 5387.10009765625 156.9 156.899993896484 |
> 4. | 2792.2 2792.19995117188 209.2 209.199996948242 |
> 5. | 4313.2 4313.2001953125 203.4 203.399993896484 |
> |-------------------------------------------------------|
> 6. | 4643.9 4643.89990234375 207.2 207.199996948242 |
> 7. | 4551.2 4551.2001953125 255.2 255.199996948242 |
> 8. | 3244.1 3244.10009765625 303.7 303.700012207031 |
> 9. | 4053.7 4053.69995117188 264.1 264.100006103516 |
> 10. | 4379.3 4379.2998046875 201.6 201.600006103516 |
> |-------------------------------------------------------|
> 11. | 4840.9 4840.89990234375 265 265 |
> 12. | 4900.9 4900.89990234375 402.2 402.200012207031 |
> 13. | 3526.5 3526.5 761.5 761.5 |
> 14. | 3254.7 3254.69995117188 922.4 922.400024414063 |
> 15. | 3700.2 3700.19995117188 1020.1 1020.09997558594 |
> |-------------------------------------------------------|
> 16. | 3755.6 3755.60009765625 1099 1099 |
> 17. | 4833 4833 1207.7 1207.69995117188 |
> 18. | 4924.9 4924.89990234375 1430.5 1430.5 |
> 19. | 6241.7 6241.7001953125 1777.3 1777.30004882813 |
> 20. | 5593.6 5593.60009765625 2226.3 2226.30004882813 |
> +-------------------------------------------------------+
>
> . egen double d1 = total(dvalue), by(com)
>
> . egen double d2 = total(dstock), by(com)
>
> . format %20.15g d1 d2
>
> . list d1 t1 d2 t2 in 1/20
>
> +--------------------------------------------------------+
> | d1 t1 d2 t2 |
> |--------------------------------------------------------|
> 1. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 2. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 3. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 4. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 5. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> |--------------------------------------------------------|
> 6. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 7. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 8. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 9. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 10. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> |--------------------------------------------------------|
> 11. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 12. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 13. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 14. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 15. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> |--------------------------------------------------------|
> 16. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 17. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 18. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 19. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> 20. | 86676.9 86676.900390625 12968.7 12968.7000625134 |
> +--------------------------------------------------------+
>
> Now we do Scott's experiment in double precision:
>
> . gen double dinvest = round(invest,10^(digits+ceil(log10(abs(invest)))))
>
> . recast long company
>
> . areg dinvest dstock d1, ab(com)
>
> Linear regression, absorbing indicators Number of obs = 200
> F( 1, 189) = 366.45
> Prob > F = 0.0000
> R-squared = 0.9184
> Adj R-squared = 0.9141
> Root MSE = 63.566
>
> ------------------------------------------------------------------------------
> dinvest | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> dstock | .3707496 .0193676 19.14 0.000 .3325452 .4089541
> d1 | (dropped)
> _cons | 43.62499 6.984315 6.25 0.000 29.84777 57.40222
> -------------+----------------------------------------------------------------
> company | F(9, 189) = 36.975 0.000 (10 categories)
>
> . areg dinvest dvalue d2, ab(com)
>
> Linear regression, absorbing indicators Number of obs = 200
> F( 1, 189) = 111.35
> Prob > F = 0.0000
> R-squared = 0.8491
> Adj R-squared = 0.8411
> Root MSE = 86.444
>
> ------------------------------------------------------------------------------
> dinvest | Coef. Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> dvalue | .1898776 .0179944 10.55 0.000 .1543819 .2253733
> d2 | (dropped)
> _cons | -59.42872 20.40144 -2.91 0.004 -99.6725 -19.18494
> -------------+----------------------------------------------------------------
> company | F(9, 189) = 15.973 0.000 (10 categories)
>
>
>
> So my advice is to be very wary of single precision floating point variable.
>
>
> -Rich
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/