Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: reversible -destring-, precision, longs v doubles
From
Sergiy Radyakin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: reversible -destring-, precision, longs v doubles
Date
Tue, 6 Aug 2013 18:15:21 -0400
Laszlo, why don't you create your own ids? Consider the following example:
sysuse auto, clear
isid make
egen long id=group(make)
isid id
list id make
Your generated ids will be 'nice' in a sense that you don't need to
worry about leading zeroes, they will be numeric, and they will fit
into the long type limitations. Even if you have to merge multiple
different files it's doable with a few more lines, and it saves you
much of headache later on. It is also a more universal approach, as
neither float nor double would be able to accommodate something like
12-char wide region followed by 12-char wide PSU followed by 12-char
wide HH number id that is easily handled by -egen-group-. And if you
need any components of ID separately, (like a region code in the
previous example) extract it before converting the IDs into the
numeric form.
All credit of course goes to NJC:
http://www.stata.com/support/faqs/data-management/creating-group-identifiers/
Best, Sergiy Radyakin
PS: It seems that statistical offices are not just 'fond of 10 digits
or more' as you write, but they are simply using software that is
handling large numbers as strings. CSPro is one such example. One
simply declares the width of the field in digits, whether decimal
point is present or implied, etc. That is very flexible, and you can
have an ID of any length.
On Tue, Aug 6, 2013 at 5:36 PM, László Sándor <[email protected]> wrote:
> Thanks, Nick, as always.
>
> I am actually still confused, and maybe it is not just me: Could you
> discuss when the reversibility check would fail?
>
> From Bill's penultimate guide to precision, esp. points 4.3, 4.4, I
> gather that the IDs will be unique (not rounded) if my system is set
> to double as the default datatype. Permanently.
> http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/
>
> Still, it is a bit scary to risk rounding your identifiers by a
> mistaken float somewhere. On the other hand, string identifiers cannot
> be panel IDs for xtset, so I need to bite the bullet.
>
> Thanks again,
>
> Laszlo
>
> On Tue, Aug 6, 2013 at 12:29 PM, Nick Cox <[email protected]> wrote:
>> I guess I wrote some zeroth version of that.
>>
>> Conversion is reversible if real(string(<original>)) = <original> or
>> string(real(<original>)) = <original> where <original> is whatever you
>> feed in and -string()- can use whatever format is specified.
>>
>> What this amounts to is a stipulation is that you must lose no
>> information, crucial if you change your mind about what should be done
>> to the data.
>>
>> So, a reversible potato peeler or university education would restore
>> the potatoes or the students to their original state.
>> Nick
>> [email protected]
>>
>>
>> On 6 August 2013 17:03, László Sándor <[email protected]> wrote:
>>> I ran into an error with identifiers longer than -maxlong()- before
>>> (blame statistical offices fond of 10 digits or more). So now I wanted
>>> to be careful while destringing, but you cannot specify the type for
>>> the result — however, -destring- breaks if the process is not
>>> "reversible." What does it mean exactly? I cannot find it documented.
>>> (Actually, the default type for -destring- is double, so it is surely
>>> not the case the destring only produces longs unless forced to.)
>>>
>>> Do I need to worry about my identifiers becoming imprecise or rounded
>>> if -destring- did not warn me?
>>>
>>> The documentation of -tostring- does contain the following, but this
>>> is not exactly the same thing.
>>>
>>> Conversion of numeric data to string equivalents can be problematic.
>>> Stata, like most software, holds numeric data to finite precision and
>>> in binary form. See the discussion in [U] 13.11 Precision and problems
>>> therein. If no format() is specified, tostring uses the format %12.0g.
>>> This format is, in particular, sufficient to convert integers held as
>>> bytes, ints, or longs to string equivalent without loss of precision.
>>> However, users will often need to specify a format themselves,
>>> especially when the numeric data have fractional parts and for some
>>> reason a conversion to string is required.
>>>
>>> Thanks!
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/