Hello list,
I have noticed a little problem with non-unique merging, when the sort
option is used. Sort appears to set the 'unique master' option. I'll
give an example:
----------------------------------
clear
set obs 10
gen id = sum(1)
gen big = (id>=5)
list
tempfile tmp
save `tmp'
clear
set obs 10
gen id = ceil(sum(0.5))
list
merge id using tmp, sort
----------------------------------
The error I receive is that "variable id does not uniquely identify
observations in the master data". If re-run the above example except
manually sort each dataset and do not set the sort option then it
returns a warning rather than an error.
The help on merge describes the 'unique' option as options as, if
specified then uniqueness is a requirement but otherwise it is not. I
quote:
If none of the three unique options are specified, observations in
neither the master nor the using data are required to be unique...
... If they are not unique, records that have the same values of
the match variable are joined by observation until all the records
one side or the other are matched and after that the final record
on the shorter side is duplicated over and over again to match
with the remaining records needing to be matched on the longer
side.
There is nothing said in the description of the sort option. And so
the above example should produce the dataset:
count big _merge
1 0 3
1 0 3
2 0 3
2 0 3
3 0 3
3 0 3
4 0 3
4 0 3
5 0 3
5 0 3
6 1 2
7 1 2
8 1 2
9 1 2
10 1 2
It isn't anything serious, I know, but it's not the stated behaviour
for merge.
Cheers
James
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/