[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: bug: non-unique merge

From	James Muller <[email protected]>
To	[email protected]
Subject	st: bug: non-unique merge
Date	Wed, 5 Oct 2005 17:25:17 +1000

Hello list,

I have noticed a little problem with non-unique merging, when the sort
option is used. Sort appears to set the 'unique master' option. I'll
give an example:

----------------------------------

   clear
   set obs 10
   gen id = sum(1)
   gen big = (id>=5)
   list

   tempfile tmp
   save `tmp'

   clear
   set obs 10
   gen id = ceil(sum(0.5))
   list

   merge id using tmp, sort

----------------------------------


The error I receive is that "variable id does not uniquely identify
observations in the master data". If re-run the above example except
manually sort each dataset and do not set the sort option then it
returns a warning rather than an error.

The help on merge describes the 'unique' option as options as, if
specified then uniqueness is a requirement but otherwise it is not. I
quote:

    If none of the three unique options are specified, observations in
    neither the master nor the using data are required to be unique...

    ... If they are not unique, records that have the same values of
    the match variable are joined by observation until all the records
    one side or the other are matched and after that the final record
    on  the shorter side is duplicated over and over again to match
    with the remaining records needing to be matched on the longer
    side.

There is nothing said in the description of the sort option. And so
the above example should produce the dataset:

count   big	_merge
1	0	3
1	0	3
2	0	3
2	0	3
3	0	3
3	0	3
4	0	3
4	0	3
5	0	3
5	0	3
6	1	2
7	1	2
8	1	2
9	1	2
10	1	2

It isn't anything serious, I know, but it's not the stated behaviour
for merge.

Cheers

James

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: filling in missing values
Next by Date: Re: st: RE: RE: Changes of the dataset
Previous by thread: st: filling in missing values
Next by thread: Re: st: bug: non-unique merge
Index(es):
- Date
- Thread