I think this will give you the mean over code1 values:
assume files are sorted on code1, code2...
. merge code1 using <code2file>
. keep if _merge!=2 /* drop code2 with no matching code1 */
. by code1, sort: egen avg_code2=mean(code2)
. by code1, sort: keep if _n==1
. ren avg_code2 code2
cheers,
jeph
Jason Hwang wrote:
Hello,
I can't find help on this problem on Statalist. I would greatly appreciate
your help.
I have two datasets which I'm trying to merge. Problem is that the mapping
is not one-to-one. Consider the following example:
code1 code2
1111 2290
1111 2291
1111 2292
1111 2292
1111 2290
5555 6700
5555 6701
5555 6700
As you can see sometime a single "code1" maps to many "code2"s. I need a
one-to-one mapping to merge the data and I'm trying to do this two ways.
1. Take a unweighted average of the "code2"s for each "code1." That is
let's say for 1111, I just want an average of a variable for 2290, 2291
and 2992 in the second dataset.
But how do I do this when I have hundreds of unique "code1"s, and each
corresponds to a different number of "code2"s? (I can imagine the code if
each code1 all mapped to the same number of code2s but this is not the
case.)
2. Use the code2 with maximum number of frequency. That is, 5555 will map
to 6700 since 6700 occurs twice while 6701 occurs once. For 1111, both
2290 and 2292 occur twice and in this case, I'll just take the one that
appears first, 2290.
I'm not sure how to write the code for this procedure. I would greatly
appreciate any help. Thank you.
Jason
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/