Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: reversible -destring-, precision, longs v doubles
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: reversible -destring-, precision, longs v doubles
Date
Wed, 7 Aug 2013 14:21:47 +0100
That does make your life more difficult but not impossible so far as I can see.
You could, for example, -merge- on the string identifiers and use
numeric identifiers for -xt- commands. After all, many of us present
two or more names depending on mood and circumstance.
The only real problem is if you have numeric identifiers too big to
hold exactly in -double-s.
Nick
[email protected]
On 7 August 2013 13:30, László Sándor <[email protected]> wrote:
> Thanks, Sergiy, Nick.
>
> FWIW, I can just say that I have many, many data sources with the same
> identifiers. Actually, -egen group()- would not guarantee the same
> numerical ids for the same strings if the files cover different
> universes (to merge).
>
> And it is error-prone to leave IDs behind. I am not sure the risk
> would be lower than with converting to the numbers.
>
> But I appreciate the advice, I keep it mind.
>
> Thanks!
>
> On Tue, Aug 6, 2013 at 6:15 PM, Sergiy Radyakin <[email protected]> wrote:
>> Laszlo, why don't you create your own ids? Consider the following example:
>>
>> sysuse auto, clear
>> isid make
>> egen long id=group(make)
>> isid id
>> list id make
>>
>> Your generated ids will be 'nice' in a sense that you don't need to
>> worry about leading zeroes, they will be numeric, and they will fit
>> into the long type limitations. Even if you have to merge multiple
>> different files it's doable with a few more lines, and it saves you
>> much of headache later on. It is also a more universal approach, as
>> neither float nor double would be able to accommodate something like
>> 12-char wide region followed by 12-char wide PSU followed by 12-char
>> wide HH number id that is easily handled by -egen-group-. And if you
>> need any components of ID separately, (like a region code in the
>> previous example) extract it before converting the IDs into the
>> numeric form.
>>
>> All credit of course goes to NJC:
>> http://www.stata.com/support/faqs/data-management/creating-group-identifiers/
>>
>> Best, Sergiy Radyakin
>>
>> PS: It seems that statistical offices are not just 'fond of 10 digits
>> or more' as you write, but they are simply using software that is
>> handling large numbers as strings. CSPro is one such example. One
>> simply declares the width of the field in digits, whether decimal
>> point is present or implied, etc. That is very flexible, and you can
>> have an ID of any length.
>>
>>
>> On Tue, Aug 6, 2013 at 5:36 PM, László Sándor <[email protected]> wrote:
>>> Thanks, Nick, as always.
>>>
>>> I am actually still confused, and maybe it is not just me: Could you
>>> discuss when the reversibility check would fail?
>>>
>>> From Bill's penultimate guide to precision, esp. points 4.3, 4.4, I
>>> gather that the IDs will be unique (not rounded) if my system is set
>>> to double as the default datatype. Permanently.
>>> http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/
>>>
>>> Still, it is a bit scary to risk rounding your identifiers by a
>>> mistaken float somewhere. On the other hand, string identifiers cannot
>>> be panel IDs for xtset, so I need to bite the bullet.
>>>
>>> Thanks again,
>>>
>>> Laszlo
>>>
>>> On Tue, Aug 6, 2013 at 12:29 PM, Nick Cox <[email protected]> wrote:
>>>> I guess I wrote some zeroth version of that.
>>>>
>>>> Conversion is reversible if real(string(<original>)) = <original> or
>>>> string(real(<original>)) = <original> where <original> is whatever you
>>>> feed in and -string()- can use whatever format is specified.
>>>>
>>>> What this amounts to is a stipulation is that you must lose no
>>>> information, crucial if you change your mind about what should be done
>>>> to the data.
>>>>
>>>> So, a reversible potato peeler or university education would restore
>>>> the potatoes or the students to their original state.
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 6 August 2013 17:03, László Sándor <[email protected]> wrote:
>>>>> I ran into an error with identifiers longer than -maxlong()- before
>>>>> (blame statistical offices fond of 10 digits or more). So now I wanted
>>>>> to be careful while destringing, but you cannot specify the type for
>>>>> the result — however, -destring- breaks if the process is not
>>>>> "reversible." What does it mean exactly? I cannot find it documented.
>>>>> (Actually, the default type for -destring- is double, so it is surely
>>>>> not the case the destring only produces longs unless forced to.)
>>>>>
>>>>> Do I need to worry about my identifiers becoming imprecise or rounded
>>>>> if -destring- did not warn me?
>>>>>
>>>>> The documentation of -tostring- does contain the following, but this
>>>>> is not exactly the same thing.
>>>>>
>>>>> Conversion of numeric data to string equivalents can be problematic.
>>>>> Stata, like most software, holds numeric data to finite precision and
>>>>> in binary form. See the discussion in [U] 13.11 Precision and problems
>>>>> therein. If no format() is specified, tostring uses the format %12.0g.
>>>>> This format is, in particular, sufficient to convert integers held as
>>>>> bytes, ints, or longs to string equivalent without loss of precision.
>>>>> However, users will often need to specify a format themselves,
>>>>> especially when the numeric data have fractional parts and for some
>>>>> reason a conversion to string is required.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/