Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Merging data sets based on a range of dates
From
Phil Schumm <[email protected]>
To
<[email protected]>
Subject
Re: st: Merging data sets based on a range of dates
Date
Tue, 6 Aug 2013 15:17:24 -0500
On Aug 6, 2013, at 1:14 PM, Joe Canner <[email protected]> wrote:
> From time to time I have a situation like the following:
>
> Surgery.dta: PatientID SurgeryDate
>
> Hospital.dta: PatientID AdmissionDate DischargeDate
>
> There are usually multiple records per PatientID in both data sets and I would like to match any records in Surgery.dta with corresponding records in Hospital.dta where SurgeryDate is between AdmissionDate and DischargeDate.
>
> Typically, my solutions are inelegant and inefficient, usually involving -reshape- (which is very slow when the number of records per PatientID is large). Does anyone know of a good method for doing this efficiently in Stata, or at least a user-written program that can easily be invoked to do this?
Sarah's suggestion is definitely the standard way of approaching this type of problem, and typically leads to the most readable code. Thus, it is to be preferred in most cases. However, if you're in a situation where the number of observations created by -joinby- is unwieldy (you indicated that the number of records per patient might be large) or if you want to preserve records corresponding to visits without a surgery, you could use something like
use Hospital
append using Surgery
gen date = cond(!mi(AdmissionDate),AdmissionDate,SurgeryDate)
bys PatientID (date AdmissionDate): replace SurgeryDate = SurgeryDate[_n+1] ///
if inrange(SurgeryDate[_n+1],AdmissionDate,DischargeDate)
drop if mi(AdmissionDate)
where I am assuming the following preconditions on the Hospital dataset
ass !mi(AdmissionDate,DischargeDate) & AdmissionDate<=DischargeDate
bys PatientID (AdmissionDate): ass AdmissionDate!=AdmissionDate[_n-1]
bys PatientID (AdmissionDate): ass AdmissionDate>=DischargeDate[_n-1] ///
if _n>1
To check for missing visit records or multiple surgeries during the same visit, you could use
bys PatientID (date AdmissionDate): ass !mi(AdmissionDate[_n-1],SurgeryDate[_n-1]) ///
if mi(AdmissionDate)
before dropping the surgery records.
-- Phil
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/