Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identifying unique values with codebook

From	Michael Mitchell <[email protected]>
To	[email protected]
Subject	Re: st: Identifying unique values with codebook
Date	Thu, 17 Jun 2010 13:57:21 -0700

I agree that for the "typical" variable, that storing the value as a
-float- is not a problem. Unfortunately, I have found that people
discover that they have an "atypical" variable after the fact, after
precision has been lost due to using a "float".

But, this also arises for typical variables when making comparisons
using fractional values. For example, using the -auto- dataset, I want
to see the cars that have a gear ratio of 2.19. As shown below, it
would appear that there are not any....

. sysuse auto
(1978 Automobile Data)
. describe gear_ratio

              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------
gear_ratio      float  %6.2f                  Gear Ratio

. list  make gear_ratio if gear_ratio == 2.19, abb(30)

<No observations shown>

But, as -help data_types- tells us, we need to use the following
technique because -gear_ratio- is a float.

. list  make gear_ratio if gear_ratio == float(2.19), abb(30)

     +----------------------------+
     | make            gear_ratio |
     |----------------------------|
 12. | Cad. Eldorado         2.19 |
     +----------------------------+

Instead, I have a dataset called -auto_double- that stores
-gear_ratio- as a -double-, because when I created the dataset I had
previously -set type double-.

. use auto_double
. describe gear_ratio

              storage  display     value
variable name   type   format      label      variable label
--------------------------------------------------------------------------------------------------
gear_ratio      double %10.0g

  Now, when I look for cars with a gear ratio of 2.19, I see them
without extra effort.

. list  make gear_ratio if gear_ratio == 2.19, abb(30)

     +----------------------------+
     |          make   gear_ratio |
     |----------------------------|
 12. | Cad. Eldorado         2.19 |
     +----------------------------+

  And, if I feel that I have variables that are wastefully stored as
type -double-, I can use the -compress- command to convert variables
to a more frugal storage type, such as byte, int, or long.

  I agree that, for most variables, doubles are wasteful of space.
But, I prefer to start with a double, and then have the option to go
down to a smaller storage type, than start with a float, and be unable
to upgrade to a more precise storage type.

Best regards,

Michael Mitchell



On Thu, Jun 17, 2010 at 9:51 AM, Maarten buis <[email protected]> wrote:
> --- On Thu, 17/6/10, Michael N. Mitchell wrote:
>> It seems to me that many of the "gotchas" arise
>> from the fact that the default data type is "float"
>> instead of "double".
>
> The typical variable in a dataset contains some sort
> of measurement, and most measurements are nowhere
> near as precise to warant anything more than 2 or
> 3 digits of precision, so "float" is a perfectly
> sensible default. This leaves variables that are
> supposed to represent a unique identification
> number. Here double or long may help, but these
> too can easily become too short for those cases,
> which would then require you to switch to strings.
> So, I am not convinced about the usefulness if a
> switch of the default to double.
>
> -- Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Identifying unique values with codebook
  - From: "Michael N. Mitchell" <[email protected]>
- Re: st: Identifying unique values with codebook
  - From: Maarten buis <[email protected]>

Prev by Date: Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
Next by Date: RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
Previous by thread: Re: st: Identifying unique values with codebook
Next by thread: Re: st: Identifying unique values with codebook
Index(es):
- Date
- Thread