Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Tolerance for -merge- variable
From
Rob Ploutz-Snyder <[email protected]>
To
[email protected]
Subject
Re: st: Tolerance for -merge- variable
Date
Thu, 29 Mar 2012 10:37:22 -0500
Thank you Nick for your prompt reply to my post. You have clarified
my problem exactly--and more clearly than I.
The precision problem becomes even more troublesome when different
software play the game. In my case, I rec'd one data set with ID's
that were generated in Excel... I've no idea what the precision is
there but I know that it doesn't align nicely with the other dataset
that generated those (decimal) IDs in Stata's.
Alas--I guess I am stuck with converting ID's to String for the merge.
The good news is that I was especially avoiding this solution because
I had assumed that I couldn't then use a string ID var as an
identifier in Stata's -xtmixed- or other xt routines, so I had to
back-convert into a numberic ID.
I seem to be able to use a String variable for that too so I suppose
Stata's -merge- behavior is alright in the end.
...I stubbornly admit that I still wish it had a tolerance option
that we could tweak so that, with our instruction, it would treat ID's
within ?? decimals as equal.
Again--thank you!
Rob
On Wed, Mar 28, 2012 at 1:43 PM, Nick Cox <[email protected]> wrote:
> My understanding is that there is _no_ tolerance. Equal matches,
> unequal doesn't. What implies otherwise?
>
> More specifically,
>
> 1. Like you, I wouldn't by preference use a non-integer numeric
> variable as an identifier, largely because of worries that things like
> this might happen.
>
> 2. This is expectable if one variable is -float- and the other
> -double- as then x.1 (or whatever) will be stored as different binary
> approximations. See documentation on precision, passim.
>
> 3. If the variables are the same type, please show us (a) minimal
> datasets and (b) -merge- syntax which shows your problem. But you
> should first use hexadecimal formats to see if the identifiers really
> are identical. If not, -merge- is behaving as expected.
>
> 4. Otherwise, my best advice is that conversion to string must use an
> explicit format argument to maximise your chances, e.g. -string(myvar,
> "%18.1f")-.
>
> Nick
>
> On Wed, Mar 28, 2012 at 7:26 PM, Rob Ploutz-Snyder
> <[email protected]> wrote:
>
>> I notice that when I have an ID variable stored with 1 decimal place
>> (ex. id=id+0.1) in two separate data files, the merge command
>> sometimes fails to equate ID values that are equal within rounding
>> error. This is particularly problematic if Stata generated one of
>> these id variables (ex. gen idnew=id+0.1) and Excel or some other
>> software generated the id variable in the other dataset (including
>> hand data entry).
>>
>> Is there a way to adjust the tolerance that -merge- uses on the ID var
>> that is in both data sets so that it links properly out to (for
>> example) 1 or 2 or 3 digits past the decimal??
>>
>> My only solution so far is to generate a string variable from the
>> numeric ID variables in each dataset and then use the string variable
>> for the -merge- but it seems like there should be a simpler way to
>> tweak the tolerance within -merge-. My other solution is to try to
>> avoid circumstances when the unique ID is a non-integer, but that's
>> not always an option for me.
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/