[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: duplicated studyid when merging files

From	[email protected] (William Gould, Stata)
To	[email protected]
Subject	Re: st: duplicated studyid when merging files
Date	Mon, 05 Dec 2005 08:59:43 -0600

Richard Lenhardt <[email protected]> experienced a surprise when
merging files:

> Sorted both files by studyid. 
> Merged file B into file A.
> Works fine, except that one studyid was duplicated.  _merge variable was 
> "3" for both copies. 

What this means is that there is more than one record with the same 
studyid in file B or in file A.  For instance, if the original duplicate 
was in file B, 

       File A                          File B
       studyid    other vars           studyid   other vars
          2116    4 5 2 9                 2116   90401
                          `               2116   90402

    Result:

       studyid     other vars       _merge
          2116     4 5 2 9 90401         3
          2116     4 5 2 9 90402         3

What -merge- did in this case should make sense to you.  File B had two
observations with studyid=2116, so -merge- duplicated the single studyid=2116
observation in File A and then merged.  This can be of great use.  For
instance, one might have a file of persons, and in the person file is recorded
the state in which they live.  One might have another file of state
characteristics.  One could merge the two files by state and then have a file
of persons, with characteristics of states appropriately duplicated.

In most cases, however, the id variable is unique, or supposed to be unique,
and in those cases, I reccommend specifying -merge-'s option -unique-.  It 
will not solve the problem, but it will look for the problem and issue an 
error message if it finds it.  If that is the problem, then the question 
becomes how File B (or File A) ended up with a duplicate observation when 
it should not have.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: stata and utf-8
Next by Date: RE: st: Re: Clear the Results Window
Previous by thread: st: duplicated studyid when merging files
Next by thread: st: levinlin
Index(es):
- Date
- Thread