Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: The accuracy of the float data type
From
R Zhang <[email protected]>
To
[email protected]
Subject
Re: st: The accuracy of the float data type
Date
Sat, 25 Jan 2014 14:57:54 -0500
Nick,
thank you very much for helping me and many others !!!
-Rochelle
On Fri, Jan 24, 2014 at 7:41 PM, Nick Cox <[email protected]> wrote:
> OK then;
>
> bysort firmID year : egen double maxsale=max(sales)
>
> is, we bet, your solution.
>
> Nick
> [email protected]
>
>
> On 24 January 2014 20:13, R Zhang <[email protected]> wrote:
>
>> sales was double %12.0g, maxsale was float %9.0g. My apology.
>
> On Fri, Jan 24, 2014 at 1:09 PM, Nick Cox <[email protected]> wrote:
>
>>> I wondered that too, but Rochelle said that both variables were
>>> -float-. But if that is not so, then it's likely to be the
>>> explanation.
>>>
>>> Note by the way that Stata does not use terminology such as "storage
>>> format". Display format and variable type are, as Nick Winter implies,
>>> quite different notions.
>
> On 24 January 2014 17:32, Nick Winter <[email protected]> wrote:
>
>>>> Perhaps the problem comes because the *storage* format of sales and maxsale
>>>> are different. (This is not the same as the *display* format).
>>>>
>>>> Consider:
>>>>
>>>> clear
>>>> set seed 1234567
>>>> set obs 10
>>>> gen double sales = round(uniform()*100,.001)
>>>> gen year = _n
>>>> egen float maxsale = max(sales), by(year)
>>>> gen equal = sales == maxsale
>>>>
>>>> egen double maxsale2 = max(sales), by(year)
>>>> gen equal2 = sales == maxsale2
>>>>
>>>> gen equal3 = float(sales) == maxsale
>>>>
>>>> list
>>>>
>>>>
>>>> +--------------------------------------------------------------+
>>>> | sales year maxsale equal maxsale2 equal2 equal3 |
>>>> |--------------------------------------------------------------|
>>>> 1. | 2.65 1 2.65 0 2.65 1 1 |
>>>> 2. | 17.274 2 17.274 0 17.274 1 1 |
>>>> 3. | 2.923 3 2.923 0 2.923 1 1 |
>>>> 4. | 75.377 4 75.377 0 75.377 1 1 |
>>>> 5. | 65.559 5 65.559 0 65.559 1 1 |
>>>> |--------------------------------------------------------------|
>>>> 6. | 81.163 6 81.163 0 81.163 1 1 |
>>>> 7. | 17.459 7 17.459 0 17.459 1 1 |
>>>> 8. | 24.531 8 24.531 0 24.531 1 1 |
>>>> 9. | 11.195 9 11.195 0 11.195 1 1 |
>>>> 10. | 75.953 10 75.953 0 75.953 1 1 |
>>>> +--------------------------------------------------------------+
>>>>
>>>>
>>>> If that's the case, then you need to assure that your sales and maxsale
>>>> variables are in the same storage precision (float, double); OR you need to
>>>> explicitly round the one that is double-precision to float precision when
>>>> you make the comparison, using the float() function.
>>>>
>>>> See -help precision- for more on what's going on here.
>
> On 1/24/2014 11:55 AM, R Zhang wrote:
>
>>>>> Thanks to you both, Sergiy and Nick .
>>>>>
>>>>> Nick,
>>>>>
>>>>> 1.are you saying that I should follow Sergiy's advice to change
>>>>> format? If so, given the large number of observations I have , how do
>>>>> I automate the process?
>>>>>
>>>>> 2. if I do not change the format, I listed some observations below to
>>>>> show you that sales and maxsale look the same, however, when I use" l
>>>>> if sales == maxsale" it does not list all of the observations that
>>>>> appear equal.
>>>>>
>>>>>
>>>>> *****************
>>>>> +--------------------+
>>>>> | sales maxsale1 |
>>>>> |--------------------|
>>>>> 1. | 25.395 25.395 |
>>>>> 2. | 32.007 32.007 |
>>>>> 3. | 53.798 53.798 |
>>>>> 4. | 12.748 12.748 |
>>>>> 5. | 13.793 13.793 |
>>>>> ..... omitted to save space
>>>>>
>>>>> 31. | 166.181 166.181 |
>>>>> 32. | 21.927 166.181 |
>>>>> 33. | 26.328 189.897 |
>>>>> 34. | 31.787 189.897 |
>>>>> 35. | 189.897 189.897 |
>>>>> |--------------------|
>>>>> 36. | 264.582 264.582 |
>>>>> 37. | 33.61 264.582 |
>>>>> 38. | 312.227 312.227 |
>>>>> 39. | 35.413 312.227 |
>>>>> 40. | 406.36 406.36 |
>>>>> |--------------------|
>>>>> 41. | 444.875 444.875 |
>>>>>
>>>>>
>>>>> egen maxsale=max(sales), by (gvkey year)
>>>>>
>>>>> l if sales == maxsale,
>>>>>
>>>>> the first observation that is listed is 444.875 444.875 ,
>>>>>
>>>>> why is that?
>>>>>
>>>>> thanks!
>>>>>
>>>>> On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <[email protected]> wrote:
>>>>>>
>>>>>> This is very good advice in general, but in this case the maxima are
>>>>>> selected from the original values, so that equality is to be expected
>>>>>> for some observations.
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 24 January 2014 16:31, Sergiy Radyakin <[email protected]> wrote:
>>>>>>>
>>>>>>> Zhang, avoid comparing floating point numbers for equality. Instead
>>>>>>> there is a system variable c(epsfloat) , which you can refer to when
>>>>>>> you need to deal with precision:
>>>>>>>
>>>>>>> clear
>>>>>>> input float sales
>>>>>>> 25.395
>>>>>>> 32.007
>>>>>>> end
>>>>>>>
>>>>>>> list
>>>>>>>
>>>>>>> display c(epsfloat)
>>>>>>>
>>>>>>> list if sales==25.395
>>>>>>> list if abs(sales-25.395)<=10*c(epsfloat)
>>>>>>>
>>>>>>> list if sales==32.007
>>>>>>> list if abs(sales-32.007)<=10*c(epsfloat)
>>>>>>>
>>>>>>>
>>>>>>> Best, Sergiy Radyakin
>>>>>>>
>>>>>>> On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I would do this differently:
>>>>>>>>
>>>>>>>> *------------------ begin example ------------------
>>>>>>>> // get some example data
>>>>>>>> sysuse auto
>>>>>>>>
>>>>>>>> // create a variable denoting missing values
>>>>>>>> gen byte miss = missing(rep78, price)
>>>>>>>>
>>>>>>>> // create our indicator variable
>>>>>>>> bys rep78 miss (price) : gen max = _n == _N if !miss
>>>>>>>>
>>>>>>>> // admire the result
>>>>>>>> list rep78 miss price max in 1/12, sepby(rep78)
>>>>>>>> *------------------- end example -------------------
>>>>>>>> * (For more on examples I sent to the Statalist see:
>>>>>>>> * http://www.maartenbuis.nl/example_faq )
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>> Maarten
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Dear Statalist,
>>>>>>>>>
>>>>>>>>> my data structure is as follows
>>>>>>>>>
>>>>>>>>> firmID segmentID sales year
>>>>>>>>> 1001 1 25.395 1990
>>>>>>>>> 1001 1 32.007 1991
>>>>>>>>>
>>>>>>>>> ............
>>>>>>>>>
>>>>>>>>> a firm can operate in multiple segments as identified by segmentID .
>>>>>>>>> I wanted to identify the largest segment by sales,so I used
>>>>>>>>>
>>>>>>>>> bysort firmID year : egen maxsale=max(sales)
>>>>>>>>>
>>>>>>>>> then I did
>>>>>>>>> gen PriSIC=0
>>>>>>>>> replace PriSIC=1 if sales=maxsale
>>>>>>>>>
>>>>>>>>> I got
>>>>>>>>> firmID segmentID sales year maxsale prisic
>>>>>>>>> 1001 1 25.395 1990 25.395 0
>>>>>>>>> 1001 1 32.007 1991 32.007 0
>>>>>>>>>
>>>>>>>>> I could not figure out why prisic is 0, so I compute the diffderence
>>>>>>>>> (sales-maxsale), it shows a very small negative number , and the data
>>>>>>>>> dictionary shows sales format float %12.0g, and maxsale format float
>>>>>>>>> %9.0g
>>>>>>>>>
>>>>>>>>> what should I do to correct this?
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/