Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Different results for 1:1-merging using the same variables (int & string)
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Different results for 1:1-merging using the same variables (int & string)
Date
Thu, 14 Feb 2013 10:09:17 +0000
A side detail but
gen newstrvar = strvar + string(numvar)
is a simpler way to do that in this situation where -numvar- contains
4-digit integers. Calling up -tostring- and creating a new variable
can be avoided.
More to the point, did you look closely at some or all of the
observations where you get a different result?
Nick
On Thu, Feb 14, 2013 at 8:17 AM, Hofbaur, Ulrich <[email protected]> wrote:
> Jeff, thanks for your suggestions! "next_year" is an integer converted to string by using the tostring-command. So, I simply add two strings. I created the variables in both files and exactly the same way. Just validated that.
Jeph Herrin
> I do not understand how you can calculate length(next_year) if "next_year" is an integer.
>
> Do you create the variables in both files, or just the -master- file? If just the -master- check that the -using- file variables have been constructed in the same way.
On 2/13/2013 1:07 PM, Hofbaur, Ulrich wrote
>> I have an issue with conducting a 1:1-merge in Stata. The merge is
> based on two variables. The 1. variable (string) consists of exactly 6 digits. The 2. variable (integer) consists of exactly 4 digits (no variation in the length of digits in either of the two variables). I tried two versions, and the they both yielded different results. Please, further note that I use the same file to merge and the variables differ
>>
>> Option 1: Defining a 10-digit string variable. Therefore, convert "var
> 2" to string and then sum var1 and var2. Hence, I obtain" var3" (which is a 10-digit string; again no variation w.r.t to the length of
> 10-digits) and merge (1:1) on "var3". → Results in 15,839 matches
>> Option 2: Merge (1:1) on var1 and var2 as separate variables →
> Results in 14,227
>>
>> Does anybody know where this difference comes from. My gut feeling
> tells me that Option 2 is the more reliable one. However, I lack evidence on that. The abbreviated Do-File is attached.
>>
>> Thank you very much for your support!
>>
>> Best,
>> Ulrich
>>
>> ******* Do File **************
>>
>> use F:\001_Forschung\Daten\Cash&Acquisitions\file_A_prelim.dta, clear
>>
>> * Option 1
>> gen acquirorcusip_year=cusip_6dgt+next_year //corresponds to var 3 in
> the above description
>> gen length_cusip_6dgt=length(cusip_6dgt)
>> gen length_announcement_year=length(next_year)
>> gen length_acquirorcusip_year=length(acquirorcusip_year)
>> sum length_cusip_6dgt length_announcement_year
> length_acquirorcusip_year
>>
>> Variable Obs Mean Std. Dev. Min Max
>> length_cus~t 196217 6 0 6 6
>> length_ann~r 196217 4 0 4 4
>> length_acq~r 196217 10 0 10 10
>>
>> * Option 2
>> gen announcement_year=next_year // corresponds to var 2 in the above
> description. Rename due to file_B
>> destring announcement_year, replace
>> gen acquirorcusip=cusip_6dgt // corresponds to var 1 in the above
> description.
>> sort acquirorcusip announcement_year
>>
>> save file_A.dta, replace
>>
>>
>> * Option 1: Merge on the joint string variable
>> use file_A.dta, clear
>> merge 1:1 acquirorcusip_year using
> F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta
>>
>> Result # of obs.
>> -----------------------------------------
>> not matched 191,640
>> from master 180,378 (_merge==1)
>> from using 11,262 (_merge==2)
>>
>> matched 15,839 (_merge==3)
>> -----------------------------------------
>>
>> * Option 2: Merge on two separate variables
>> use file_A.dta, clear
>> merge 1:1 acquirorcusip announcement_year using
> F:\001_Forschung\Daten\Cash&Acquisitions\file_B.dta
>>
>> Result # of obs.
>> -----------------------------------------
>> not matched 194,864
>> from master 181,990 (_merge==1)
>> from using 12,874 (_merge==2)
>>
>> matched 14,227 (_merge==3)
>> -----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/