Your most common values can be obtained by
bysort code1 code2 : gen count = - _N [!!! NB - ]
bysort code1 (count code2) : gen mode = code2[1]
Nick
[email protected]
Jason Hwang
> I didn't describe very well last time what I wanted to do. Let me try
> again.
>
> I have two datasets I'm trying to merge of the following form.
>
> dataset1:
>
> code1 output
> 1111 100
> 5555 340
>
> dataset2:
>
> code2 pchange code1
> 3431 .5 1111
> 3431 .5 1111
> 3450 -.5 1111
> 3451 .7 1111
> 9903 .4 5555
> 9945 .1 5555
> 9903 .4 5555
> 9905 -.6 5555
> 9945 .1 5555
>
> I'm trying to use dataset1 as the original (master) and merge into it
> dataset2. Problem: each code1 maps to many code2s. So here's
> what I would
> like to do: for each code1, find a code2 which corresponds to
> it with the
> greatest frequency. So for code1, 1111, I want 3431. For
> 5555, both 9903
> and 9945 occur twice. In this case, I'll just take whichever shows up
> first in the sorted list; i.e. 9903.
>
> The final output I'm looking for would be:
>
> code1 code2 output pchange
> 1111 3431 100 .5
> 5555 9903 340 .4
>
> Could some one how to write a code for this procedure? Thank you very
> much.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/