Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | R Zhang <r05zhang@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: The accuracy of the float data type |
Date | Sat, 25 Jan 2014 14:57:54 -0500 |
Nick, thank you very much for helping me and many others !!! -Rochelle On Fri, Jan 24, 2014 at 7:41 PM, Nick Cox <njcoxstata@gmail.com> wrote: > OK then; > > bysort firmID year : egen double maxsale=max(sales) > > is, we bet, your solution. > > Nick > njcoxstata@gmail.com > > > On 24 January 2014 20:13, R Zhang <r05zhang@gmail.com> wrote: > >> sales was double %12.0g, maxsale was float %9.0g. My apology. > > On Fri, Jan 24, 2014 at 1:09 PM, Nick Cox <njcoxstata@gmail.com> wrote: > >>> I wondered that too, but Rochelle said that both variables were >>> -float-. But if that is not so, then it's likely to be the >>> explanation. >>> >>> Note by the way that Stata does not use terminology such as "storage >>> format". Display format and variable type are, as Nick Winter implies, >>> quite different notions. > > On 24 January 2014 17:32, Nick Winter <njgwinter@gmail.com> wrote: > >>>> Perhaps the problem comes because the *storage* format of sales and maxsale >>>> are different. (This is not the same as the *display* format). >>>> >>>> Consider: >>>> >>>> clear >>>> set seed 1234567 >>>> set obs 10 >>>> gen double sales = round(uniform()*100,.001) >>>> gen year = _n >>>> egen float maxsale = max(sales), by(year) >>>> gen equal = sales == maxsale >>>> >>>> egen double maxsale2 = max(sales), by(year) >>>> gen equal2 = sales == maxsale2 >>>> >>>> gen equal3 = float(sales) == maxsale >>>> >>>> list >>>> >>>> >>>> +--------------------------------------------------------------+ >>>> | sales year maxsale equal maxsale2 equal2 equal3 | >>>> |--------------------------------------------------------------| >>>> 1. | 2.65 1 2.65 0 2.65 1 1 | >>>> 2. | 17.274 2 17.274 0 17.274 1 1 | >>>> 3. | 2.923 3 2.923 0 2.923 1 1 | >>>> 4. | 75.377 4 75.377 0 75.377 1 1 | >>>> 5. | 65.559 5 65.559 0 65.559 1 1 | >>>> |--------------------------------------------------------------| >>>> 6. | 81.163 6 81.163 0 81.163 1 1 | >>>> 7. | 17.459 7 17.459 0 17.459 1 1 | >>>> 8. | 24.531 8 24.531 0 24.531 1 1 | >>>> 9. | 11.195 9 11.195 0 11.195 1 1 | >>>> 10. | 75.953 10 75.953 0 75.953 1 1 | >>>> +--------------------------------------------------------------+ >>>> >>>> >>>> If that's the case, then you need to assure that your sales and maxsale >>>> variables are in the same storage precision (float, double); OR you need to >>>> explicitly round the one that is double-precision to float precision when >>>> you make the comparison, using the float() function. >>>> >>>> See -help precision- for more on what's going on here. > > On 1/24/2014 11:55 AM, R Zhang wrote: > >>>>> Thanks to you both, Sergiy and Nick . >>>>> >>>>> Nick, >>>>> >>>>> 1.are you saying that I should follow Sergiy's advice to change >>>>> format? If so, given the large number of observations I have , how do >>>>> I automate the process? >>>>> >>>>> 2. if I do not change the format, I listed some observations below to >>>>> show you that sales and maxsale look the same, however, when I use" l >>>>> if sales == maxsale" it does not list all of the observations that >>>>> appear equal. >>>>> >>>>> >>>>> ***************** >>>>> +--------------------+ >>>>> | sales maxsale1 | >>>>> |--------------------| >>>>> 1. | 25.395 25.395 | >>>>> 2. | 32.007 32.007 | >>>>> 3. | 53.798 53.798 | >>>>> 4. | 12.748 12.748 | >>>>> 5. | 13.793 13.793 | >>>>> ..... omitted to save space >>>>> >>>>> 31. | 166.181 166.181 | >>>>> 32. | 21.927 166.181 | >>>>> 33. | 26.328 189.897 | >>>>> 34. | 31.787 189.897 | >>>>> 35. | 189.897 189.897 | >>>>> |--------------------| >>>>> 36. | 264.582 264.582 | >>>>> 37. | 33.61 264.582 | >>>>> 38. | 312.227 312.227 | >>>>> 39. | 35.413 312.227 | >>>>> 40. | 406.36 406.36 | >>>>> |--------------------| >>>>> 41. | 444.875 444.875 | >>>>> >>>>> >>>>> egen maxsale=max(sales), by (gvkey year) >>>>> >>>>> l if sales == maxsale, >>>>> >>>>> the first observation that is listed is 444.875 444.875 , >>>>> >>>>> why is that? >>>>> >>>>> thanks! >>>>> >>>>> On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>>>>> >>>>>> This is very good advice in general, but in this case the maxima are >>>>>> selected from the original values, so that equality is to be expected >>>>>> for some observations. >>>>>> Nick >>>>>> njcoxstata@gmail.com >>>>>> >>>>>> >>>>>> On 24 January 2014 16:31, Sergiy Radyakin <serjradyakin@gmail.com> wrote: >>>>>>> >>>>>>> Zhang, avoid comparing floating point numbers for equality. Instead >>>>>>> there is a system variable c(epsfloat) , which you can refer to when >>>>>>> you need to deal with precision: >>>>>>> >>>>>>> clear >>>>>>> input float sales >>>>>>> 25.395 >>>>>>> 32.007 >>>>>>> end >>>>>>> >>>>>>> list >>>>>>> >>>>>>> display c(epsfloat) >>>>>>> >>>>>>> list if sales==25.395 >>>>>>> list if abs(sales-25.395)<=10*c(epsfloat) >>>>>>> >>>>>>> list if sales==32.007 >>>>>>> list if abs(sales-32.007)<=10*c(epsfloat) >>>>>>> >>>>>>> >>>>>>> Best, Sergiy Radyakin >>>>>>> >>>>>>> On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <maartenlbuis@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> I would do this differently: >>>>>>>> >>>>>>>> *------------------ begin example ------------------ >>>>>>>> // get some example data >>>>>>>> sysuse auto >>>>>>>> >>>>>>>> // create a variable denoting missing values >>>>>>>> gen byte miss = missing(rep78, price) >>>>>>>> >>>>>>>> // create our indicator variable >>>>>>>> bys rep78 miss (price) : gen max = _n == _N if !miss >>>>>>>> >>>>>>>> // admire the result >>>>>>>> list rep78 miss price max in 1/12, sepby(rep78) >>>>>>>> *------------------- end example ------------------- >>>>>>>> * (For more on examples I sent to the Statalist see: >>>>>>>> * http://www.maartenbuis.nl/example_faq ) >>>>>>>> >>>>>>>> Hope this helps, >>>>>>>> Maarten >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <r05zhang@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Dear Statalist, >>>>>>>>> >>>>>>>>> my data structure is as follows >>>>>>>>> >>>>>>>>> firmID segmentID sales year >>>>>>>>> 1001 1 25.395 1990 >>>>>>>>> 1001 1 32.007 1991 >>>>>>>>> >>>>>>>>> ............ >>>>>>>>> >>>>>>>>> a firm can operate in multiple segments as identified by segmentID . >>>>>>>>> I wanted to identify the largest segment by sales,so I used >>>>>>>>> >>>>>>>>> bysort firmID year : egen maxsale=max(sales) >>>>>>>>> >>>>>>>>> then I did >>>>>>>>> gen PriSIC=0 >>>>>>>>> replace PriSIC=1 if sales=maxsale >>>>>>>>> >>>>>>>>> I got >>>>>>>>> firmID segmentID sales year maxsale prisic >>>>>>>>> 1001 1 25.395 1990 25.395 0 >>>>>>>>> 1001 1 32.007 1991 32.007 0 >>>>>>>>> >>>>>>>>> I could not figure out why prisic is 0, so I compute the diffderence >>>>>>>>> (sales-maxsale), it shows a very small negative number , and the data >>>>>>>>> dictionary shows sales format float %12.0g, and maxsale format float >>>>>>>>> %9.0g >>>>>>>>> >>>>>>>>> what should I do to correct this? > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/