Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identifying unique values with codebook
From
Michael Mitchell <[email protected]>
To
[email protected]
Subject
Re: st: Identifying unique values with codebook
Date
Thu, 17 Jun 2010 13:57:21 -0700
I agree that for the "typical" variable, that storing the value as a
-float- is not a problem. Unfortunately, I have found that people
discover that they have an "atypical" variable after the fact, after
precision has been lost due to using a "float".
But, this also arises for typical variables when making comparisons
using fractional values. For example, using the -auto- dataset, I want
to see the cars that have a gear ratio of 2.19. As shown below, it
would appear that there are not any....
. sysuse auto
(1978 Automobile Data)
. describe gear_ratio
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------
gear_ratio float %6.2f Gear Ratio
. list make gear_ratio if gear_ratio == 2.19, abb(30)
<No observations shown>
But, as -help data_types- tells us, we need to use the following
technique because -gear_ratio- is a float.
. list make gear_ratio if gear_ratio == float(2.19), abb(30)
+----------------------------+
| make gear_ratio |
|----------------------------|
12. | Cad. Eldorado 2.19 |
+----------------------------+
Instead, I have a dataset called -auto_double- that stores
-gear_ratio- as a -double-, because when I created the dataset I had
previously -set type double-.
. use auto_double
. describe gear_ratio
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------------
gear_ratio double %10.0g
Now, when I look for cars with a gear ratio of 2.19, I see them
without extra effort.
. list make gear_ratio if gear_ratio == 2.19, abb(30)
+----------------------------+
| make gear_ratio |
|----------------------------|
12. | Cad. Eldorado 2.19 |
+----------------------------+
And, if I feel that I have variables that are wastefully stored as
type -double-, I can use the -compress- command to convert variables
to a more frugal storage type, such as byte, int, or long.
I agree that, for most variables, doubles are wasteful of space.
But, I prefer to start with a double, and then have the option to go
down to a smaller storage type, than start with a float, and be unable
to upgrade to a more precise storage type.
Best regards,
Michael Mitchell
On Thu, Jun 17, 2010 at 9:51 AM, Maarten buis <[email protected]> wrote:
> --- On Thu, 17/6/10, Michael N. Mitchell wrote:
>> It seems to me that many of the "gotchas" arise
>> from the fact that the default data type is "float"
>> instead of "double".
>
> The typical variable in a dataset contains some sort
> of measurement, and most measurements are nowhere
> near as precise to warant anything more than 2 or
> 3 digits of precision, so "float" is a perfectly
> sensible default. This leaves variables that are
> supposed to represent a unique identification
> number. Here double or long may help, but these
> too can easily become too short for those cases,
> which would then require you to switch to strings.
> So, I am not convinced about the usefulness if a
> switch of the default to double.
>
> -- Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/