Thanks, Nick and Phil, for your helpful answers!
Jian
> Phil gave an excellent answer, and I just want to add one more
> detail.
>
> If all you want to do is to extract the last three digits of
> a numeric ID, you can use
>
> mod(ID, 1000)
>
> You may have been taught about the modulus function under the
> name of remainder (or the equivalent in your first
> language).
>
> mod(21557127, 1000)
>
> is the remainder (what is left over) after dividing
> 21557127 by 1000, namely 127.
>
> Nick
> [email protected]
>
> Phil Schumm replied to Jian Zhang
>
> > > I have a problem about number precision. I cann't figure out what
> > > it happened. Hope that you can help me out. Thanks.
> > >
> > > Here is the data:
> > > ID
> > > 21557127
> > >
> > > then i run the following do file trying to extract the last three
> > > digits from the ID:
> > >
> > > gen double temxxx=(ID/1000)
> > > gen temyyy=int(temxxx)
> > > gen temzzz=temxxx-temyyy
> > > gen areaxxx=(temzzz*1000)
> > > drop temxxx temyyy temzzz
> > >
> > > the generated data looks like the following:
> > > ID areaxxx
> > > 21557127 127
> > >
> > > However, when I typed: list if areaxxx==127, stata in fact listed
> > > nothing!
> > >
> > > First I thought it may be because areaxxx is a floating-point
> > > variable, so I type: list if areaxxx=float(127). However, Stata
> > > listed nothing again.
>
> > First, let me say that if all you want to do is to extract the last
> > three digits of the ID, here is the way to do it:
> >
> > . di real(substr(string(ID,"%12.0g"),-3,.))
> > 127
> >
> > Note that if you just use string(ID) this will not work, as string()
> > uses a default format which is not wide enough for your ID
> > (%12.0g is
> > the default format for the long storage type, which I presume is how
> > your ID variable is stored).
> >
> > Second, this is exactly the reason why you should not store IDs as
> > numbers -- you should store them as strings instead. For
> > example, if
> > ID were a string variable, then extracting the last three digits
> > would be even simpler:
> >
> > . di substr(ID,-3,.)
> > 127
> >
> > and would be guaranteed to work no matter how long your IDs are
> > (provided they are no longer than 244 characters).
> >
> > Finally, what happened above? The problem was indeed due to the
> > error inherent in floating-point arithmetic. For example, here is
> > the calculation you performed:
> >
> > . di %24.18f float( 1000 * float( (21557127/1000) - float( int
> > (21557127/1000) ) ) )
> > 127.000007629394531250
> >
> > which, as you can see is not equal to 127. Let's take a closer look:
> >
> > float( 1000 * float( (21557127/1000) - float( int
> > (21557127/1000) ) ) )
> >
> > ---- temxxx --- ---- temxxx ---
> >
> > --------- temyyy
> > ----------
> >
> > --------------------- temzzz
> > ------------------------
> >
> > -------------------------------- areaxxx
> > -----------------------------
> >
> > Notice how I am using the float() function to mimic the fact that,
> > although you created temxxx as a double, you did not do so for the
> > other intermediate variables. Now in this case, had you also
> > created temzzz as a double, you would have gotten what you wanted:
> >
> > . assert float( 1000 * ( (21557127/1000) - float( int
> > (21557127/1000) ) ) ) == 127
> >
> > However, as I said above, it is nearly always better to store IDs
> > such as these as string variables.
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/