Hi all,
I'm a relatively new STATA user, and I'm trying to merge a couple of large datasets where neither the master nor the using dataset has a unique key.
The data comes in this format:
Dataset 1: (note that LINKIDX is not unique)
EVNTIDX LINKIDX EVENTYR EVENTMM EVENTDD ...
1. 300020190021 300020190083 2006 8 6
2. 300020190021 300020190052 2006 8 6
3. 300110100795 300110101161 2006 4 10
4. 300110100822 300110101161 2006 7 19
5. 300110100808 300110101161 2006 5 8
Dataset 2: (note that LINKIDX is not unique)
LINKIDX DUPERSID RXRECIDX ...
1. 300020190083 30002019 300020190083001
2. 300020190083 30002019 300020198849002
3. 300110101161 30011010 300110101161001
4. 300110101161 30011010 300110101161003
I have already performed a merge where I have limited dataset 1 to only the unique observations of LINKIDX, and linked them to the multiple observations in dataset 2 (using a one-to-many merge). In the case of the above datasets, it would involve linking observation 1 in dataset 1 to observations 2 and 3 in dataset 2.
However, I would like to perform a random link for the remaining observations. That is, for observations 3-5 in dataset 1, which match the LINKIDX for observations 3 and 4 in dataset 2, I would like for STATA to randomly pick a LINKIDX in dataset 1 to merge with each matching LINKIDX in dataset 2.
I am not sure whether I should simply use the merge function, because it may result in systematic selection of one observation in dataset 1.
Any ideas as to how I might be able to accomplish this task?
Thank you in advance!
Regards,
Anna Dijkstra
Please access the attached hyperlink for an important electronic communications disclaimer: http://www.lse.ac.uk/collections/secretariat/legal/disclaimer.htm
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/