Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: tricky data merge/joinby problem
From
"Dimitriy V. Masterov" <[email protected]>
To
[email protected]
Subject
Re: st: tricky data merge/joinby problem
Date
Fri, 4 Mar 2011 12:35:24 -0500
Just to follow up on this for posterity, "panelized" merge approach
seems to be roughly twice as fast as the joinby method with fake data.
Simple code below.
#delimit;
version 11.1;
set more off;
capture trace off;
clear all;
macro drop _all;
set mem 5g;
tempfile file1 file2 joinbydata;
/* create panel data */
input
bgid str6 dateyq bgpop;
1 2010q1 100;
1 2010q2 105;
1 2010q3 106;
1 2010q4 125;
2 2010q1 110;
2 2010q2 115;
2 2010q3 116;
2 2010q4 135;
end;
save `file2';
clear;
/* create fraction data */
input
bid bgid fracpop;
11 1 .5;
12 1 .5;
21 2 .3;
22 2 .2;
23 2 .5;
end;
save `file1';
/* (1) joinsby approach */
timer on 1;
joinby bgid using `file2';
timer off 1;
sort bid dateyq;
list, sepby(bid);
save `joinbydata';
/* (2) panelize and merge approach */
use `file1', clear;
timer on 2;
expand 4;
sort bid;
bys bid: gen dateyq=string(_n);
strrec dateyq ("1"="2010q1") ("2"="2010q2") ("3"="2010q3")
("4"="2010q4"), replace;
/* list, sepby(bid); */
merge m:1 bgid dateyq using `file2';
timer off 2;
timer list 1;
timer list 2;
sort bid dateyq;
list, sepby(bid);
drop _merge;
/* Compare approached */
cf * using `joinbydata', all;
/* m:m merge fail */
use `file1', clear;
merge m:m bgid using `file2';
cf * using `joinbydata', all;
timer clear;
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/