|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: merge by nearest date
Joseph,
As far as I know there are no facilities in Stata for doing fuzzy
matching. One possibility is to rename the visit variable in file1
(master) to visit1 and in file2 (using) to visit2, -joinby idcode-,
calculate diff = abs(visit1 - visit2) and drop all records where diff >
30 or some other threshold. Do -duplicates report idcode visit2-
afterwards to see if you've got the same file2 record attached to more
than one file1 record (assuming that there are no duplicate visit dates
in file2) and inspect these to see what you can do to fix them (e.g., by
changing the date threshold or by adding a visit3 = visit2 that you
alter in order to encourage/discourage specific matches).
Michael
Joseph Wagner wrote:
I am using Stata ver 8.2 and wish to merge two files each with an idcode
and a date. (I know that more recent versions of Stata can handle this
more capably but funds are short at this time so I make do with what I
have.) None of the dates in using matches exactly with master. I get
the following:
. tab _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
1 | 585 1.82 1.82
2 | 31,554 98.18 100.00
------------+-----------------------------------
Total | 32,139 100.00
This is what I expected but I would like to see how close I can get a
pair (master and using) that are within one week of one another. If I
calculate a simple difference between master and the next using above
will miss any closer using if it is below the master. I hope this makes
sense. Here is a snippet of the data:
idcode visit _merge
FEL00057 26 Sep 33 2
FEL00057 30 Sep 33 1
FEL00057 22 Nov 33 2
FEL00057 24 Jan 34 2
FEL00057 28 May 34 2
FEL00057 03 Aug 34 2
FEL00057 23 Sep 34 1
FEL00057 24 Sep 34 2
FEL00057 27 Nov 34 2
FEL00057 28 Jan 35 2
FEL00057 27 Mar 35 1
FEL00057 30 Mar 35 2
If my data is arranged in this fashion, how do I best get the closest
pair of using and master?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/