Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: The accuracy of the float data type


From   R Zhang <[email protected]>
To   [email protected]
Subject   Re: st: The accuracy of the float data type
Date   Fri, 24 Jan 2014 15:13:23 -0500

sales was double %12.0g, maxsale was float %9.0g. My apology.

On Fri, Jan 24, 2014 at 1:09 PM, Nick Cox <[email protected]> wrote:
> I wondered that too, but Rochelle said that both variables were
> -float-. But if that is not so, then it's likely to be the
> explanation.
>
> Note by the way that Stata does not use terminology such as "storage
> format". Display format and variable type are, as Nick Winter implies,
> quite different notions.
>
> Nick
> [email protected]
>
>
> On 24 January 2014 17:32, Nick Winter <[email protected]> wrote:
>> Perhaps the problem comes because the *storage* format of sales and maxsale
>> are different.  (This is not the same as the *display* format).
>>
>> Consider:
>>
>> clear
>> set seed 1234567
>> set obs 10
>> gen double sales = round(uniform()*100,.001)
>> gen year = _n
>> egen float maxsale = max(sales), by(year)
>> gen equal = sales == maxsale
>>
>> egen double maxsale2 = max(sales), by(year)
>> gen equal2 = sales == maxsale2
>>
>> gen equal3 = float(sales) == maxsale
>>
>> list
>>
>>
>>      +--------------------------------------------------------------+
>>      |  sales   year   maxsale   equal   maxsale2   equal2   equal3 |
>>      |--------------------------------------------------------------|
>>   1. |   2.65      1      2.65       0       2.65        1        1 |
>>   2. | 17.274      2    17.274       0     17.274        1        1 |
>>   3. |  2.923      3     2.923       0      2.923        1        1 |
>>   4. | 75.377      4    75.377       0     75.377        1        1 |
>>   5. | 65.559      5    65.559       0     65.559        1        1 |
>>      |--------------------------------------------------------------|
>>   6. | 81.163      6    81.163       0     81.163        1        1 |
>>   7. | 17.459      7    17.459       0     17.459        1        1 |
>>   8. | 24.531      8    24.531       0     24.531        1        1 |
>>   9. | 11.195      9    11.195       0     11.195        1        1 |
>>  10. | 75.953     10    75.953       0     75.953        1        1 |
>>      +--------------------------------------------------------------+
>>
>>
>> If that's the case, then you need to assure that your sales and maxsale
>> variables are in the same storage precision (float, double); OR you need to
>> explicitly round the one that is double-precision to float precision when
>> you make the comparison, using the float() function.
>>
>> See -help precision- for more on what's going on here.
>>
>>
>>
>> On 1/24/2014 11:55 AM, R Zhang wrote:
>>>
>>> Thanks to you both, Sergiy and Nick .
>>>
>>> Nick,
>>>
>>> 1.are you saying that I should follow Sergiy's advice to change
>>> format? If so, given the large number of observations I have , how do
>>> I automate the process?
>>>
>>> 2. if I do not change the format, I listed some observations below to
>>> show you that sales and maxsale look the same, however, when I use" l
>>> if sales == maxsale" it does not list all of the observations that
>>> appear equal.
>>>
>>>
>>> *****************
>>>     +--------------------+
>>>       |   sales   maxsale1 |
>>>       |--------------------|
>>>    1. |  25.395     25.395 |
>>>    2. |  32.007     32.007 |
>>>    3. |  53.798     53.798 |
>>>    4. |  12.748     12.748 |
>>>    5. |  13.793     13.793 |
>>>   ..... omitted to save space
>>>
>>>   31. | 166.181    166.181 |
>>>   32. |  21.927    166.181 |
>>>   33. |  26.328    189.897 |
>>>   34. |  31.787    189.897 |
>>>   35. | 189.897    189.897 |
>>>       |--------------------|
>>>   36. | 264.582    264.582 |
>>>   37. |   33.61    264.582 |
>>>   38. | 312.227    312.227 |
>>>   39. |  35.413    312.227 |
>>>   40. |  406.36     406.36 |
>>>       |--------------------|
>>>   41. | 444.875    444.875 |
>>>
>>>
>>>   egen maxsale=max(sales), by (gvkey year)
>>>
>>>   l if sales == maxsale,
>>>
>>> the first observation that is listed is  444.875    444.875 ,
>>>
>>> why is that?
>>>
>>> thanks!
>>>
>>> On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <[email protected]> wrote:
>>>>
>>>> This is very good advice in general, but in this case the maxima are
>>>> selected from the original values, so that equality is to be expected
>>>> for some observations.
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 24 January 2014 16:31, Sergiy Radyakin <[email protected]> wrote:
>>>>>
>>>>> Zhang, avoid comparing floating point numbers for equality. Instead
>>>>> there is a system variable c(epsfloat) , which you can refer to when
>>>>> you need to deal with precision:
>>>>>
>>>>> clear
>>>>> input float sales
>>>>> 25.395
>>>>> 32.007
>>>>> end
>>>>>
>>>>> list
>>>>>
>>>>> display c(epsfloat)
>>>>>
>>>>> list if sales==25.395
>>>>> list if abs(sales-25.395)<=10*c(epsfloat)
>>>>>
>>>>> list if sales==32.007
>>>>> list if abs(sales-32.007)<=10*c(epsfloat)
>>>>>
>>>>>
>>>>> Best, Sergiy Radyakin
>>>>>
>>>>> On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> I would do this differently:
>>>>>>
>>>>>> *------------------ begin example ------------------
>>>>>> // get some example data
>>>>>> sysuse auto
>>>>>>
>>>>>> // create a variable denoting missing values
>>>>>> gen byte miss = missing(rep78, price)
>>>>>>
>>>>>> // create our indicator variable
>>>>>> bys rep78 miss (price) : gen max = _n == _N if !miss
>>>>>>
>>>>>> // admire the result
>>>>>> list rep78 miss price max in 1/12, sepby(rep78)
>>>>>> *------------------- end example -------------------
>>>>>> * (For more on examples I sent to the Statalist see:
>>>>>> * http://www.maartenbuis.nl/example_faq )
>>>>>>
>>>>>> Hope this helps,
>>>>>> Maarten
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <[email protected]> wrote:
>>>>>>>
>>>>>>> Dear Statalist,
>>>>>>>
>>>>>>> my data structure is as follows
>>>>>>>
>>>>>>> firmID    segmentID   sales year
>>>>>>> 1001       1               25.395     1990
>>>>>>> 1001       1                32.007     1991
>>>>>>>
>>>>>>> ............
>>>>>>>
>>>>>>> a firm can operate in multiple segments as identified by  segmentID .
>>>>>>> I wanted to identify the largest segment by sales,so I used
>>>>>>>
>>>>>>> bysort firmID year : egen maxsale=max(sales)
>>>>>>>
>>>>>>> then I did
>>>>>>> gen PriSIC=0
>>>>>>> replace PriSIC=1 if sales=maxsale
>>>>>>>
>>>>>>> I got
>>>>>>> firmID    segmentID   sales year                  maxsale    prisic
>>>>>>> 1001       1               25.395     1990            25.395         0
>>>>>>> 1001       1                32.007     1991            32.007       0
>>>>>>>
>>>>>>> I could not figure out why prisic is 0, so I compute the diffderence
>>>>>>> (sales-maxsale), it shows a very small negative number , and the data
>>>>>>> dictionary shows sales format float %12.0g, and maxsale format float
>>>>>>> %9.0g
>>>>>>>
>>>>>>> what should I do to correct this?
>>>>>>>
>>>>>>> thanks!!!
>>>>>>>
>>>>>>> Rochelle
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ---------------------------------
>>>>>> Maarten L. Buis
>>>>>> WZB
>>>>>> Reichpietschufer 50
>>>>>> 10785 Berlin
>>>>>> Germany
>>>>>>
>>>>>> http://www.maartenbuis.nl
>>>>>> ---------------------------------
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index