Richard Lenhardt <[email protected]> experienced a surprise when
merging files:
> Sorted both files by studyid.
> Merged file B into file A.
> Works fine, except that one studyid was duplicated. _merge variable was
> "3" for both copies.
What this means is that there is more than one record with the same
studyid in file B or in file A. For instance, if the original duplicate
was in file B,
File A File B
studyid other vars studyid other vars
2116 4 5 2 9 2116 90401
` 2116 90402
Result:
studyid other vars _merge
2116 4 5 2 9 90401 3
2116 4 5 2 9 90402 3
What -merge- did in this case should make sense to you. File B had two
observations with studyid=2116, so -merge- duplicated the single studyid=2116
observation in File A and then merged. This can be of great use. For
instance, one might have a file of persons, and in the person file is recorded
the state in which they live. One might have another file of state
characteristics. One could merge the two files by state and then have a file
of persons, with characteristics of states appropriately duplicated.
In most cases, however, the id variable is unique, or supposed to be unique,
and in those cases, I reccommend specifying -merge-'s option -unique-. It
will not solve the problem, but it will look for the problem and issue an
error message if it finds it. If that is the problem, then the question
becomes how File B (or File A) ended up with a duplicate observation when
it should not have.
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/