Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: reversible -destring-, precision, longs v doubles
From
László Sándor <[email protected]>
To
[email protected]
Subject
Re: st: reversible -destring-, precision, longs v doubles
Date
Wed, 7 Aug 2013 08:30:45 -0400
Thanks, Sergiy, Nick.
FWIW, I can just say that I have many, many data sources with the same
identifiers. Actually, -egen group()- would not guarantee the same
numerical ids for the same strings if the files cover different
universes (to merge).
And it is error-prone to leave IDs behind. I am not sure the risk
would be lower than with converting to the numbers.
But I appreciate the advice, I keep it mind.
Thanks!
On Tue, Aug 6, 2013 at 6:15 PM, Sergiy Radyakin <[email protected]> wrote:
> Laszlo, why don't you create your own ids? Consider the following example:
>
> sysuse auto, clear
> isid make
> egen long id=group(make)
> isid id
> list id make
>
> Your generated ids will be 'nice' in a sense that you don't need to
> worry about leading zeroes, they will be numeric, and they will fit
> into the long type limitations. Even if you have to merge multiple
> different files it's doable with a few more lines, and it saves you
> much of headache later on. It is also a more universal approach, as
> neither float nor double would be able to accommodate something like
> 12-char wide region followed by 12-char wide PSU followed by 12-char
> wide HH number id that is easily handled by -egen-group-. And if you
> need any components of ID separately, (like a region code in the
> previous example) extract it before converting the IDs into the
> numeric form.
>
> All credit of course goes to NJC:
> http://www.stata.com/support/faqs/data-management/creating-group-identifiers/
>
> Best, Sergiy Radyakin
>
> PS: It seems that statistical offices are not just 'fond of 10 digits
> or more' as you write, but they are simply using software that is
> handling large numbers as strings. CSPro is one such example. One
> simply declares the width of the field in digits, whether decimal
> point is present or implied, etc. That is very flexible, and you can
> have an ID of any length.
>
>
> On Tue, Aug 6, 2013 at 5:36 PM, László Sándor <[email protected]> wrote:
>> Thanks, Nick, as always.
>>
>> I am actually still confused, and maybe it is not just me: Could you
>> discuss when the reversibility check would fail?
>>
>> From Bill's penultimate guide to precision, esp. points 4.3, 4.4, I
>> gather that the IDs will be unique (not rounded) if my system is set
>> to double as the default datatype. Permanently.
>> http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/
>>
>> Still, it is a bit scary to risk rounding your identifiers by a
>> mistaken float somewhere. On the other hand, string identifiers cannot
>> be panel IDs for xtset, so I need to bite the bullet.
>>
>> Thanks again,
>>
>> Laszlo
>>
>> On Tue, Aug 6, 2013 at 12:29 PM, Nick Cox <[email protected]> wrote:
>>> I guess I wrote some zeroth version of that.
>>>
>>> Conversion is reversible if real(string(<original>)) = <original> or
>>> string(real(<original>)) = <original> where <original> is whatever you
>>> feed in and -string()- can use whatever format is specified.
>>>
>>> What this amounts to is a stipulation is that you must lose no
>>> information, crucial if you change your mind about what should be done
>>> to the data.
>>>
>>> So, a reversible potato peeler or university education would restore
>>> the potatoes or the students to their original state.
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 6 August 2013 17:03, László Sándor <[email protected]> wrote:
>>>> I ran into an error with identifiers longer than -maxlong()- before
>>>> (blame statistical offices fond of 10 digits or more). So now I wanted
>>>> to be careful while destringing, but you cannot specify the type for
>>>> the result — however, -destring- breaks if the process is not
>>>> "reversible." What does it mean exactly? I cannot find it documented.
>>>> (Actually, the default type for -destring- is double, so it is surely
>>>> not the case the destring only produces longs unless forced to.)
>>>>
>>>> Do I need to worry about my identifiers becoming imprecise or rounded
>>>> if -destring- did not warn me?
>>>>
>>>> The documentation of -tostring- does contain the following, but this
>>>> is not exactly the same thing.
>>>>
>>>> Conversion of numeric data to string equivalents can be problematic.
>>>> Stata, like most software, holds numeric data to finite precision and
>>>> in binary form. See the discussion in [U] 13.11 Precision and problems
>>>> therein. If no format() is specified, tostring uses the format %12.0g.
>>>> This format is, in particular, sufficient to convert integers held as
>>>> bytes, ints, or longs to string equivalent without loss of precision.
>>>> However, users will often need to specify a format themselves,
>>>> especially when the numeric data have fractional parts and for some
>>>> reason a conversion to string is required.
>>>>
>>>> Thanks!
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/