Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Merge command as date lookup tool
From
Menno Kiel <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Merge command as date lookup tool
Date
Wed, 16 Apr 2014 10:53:27 +0000
L.S.,
I have a question about using the merge command as a means of performing a date lookup to retrieve a value attached to the applicable date range.
The variable that needs to be looked up is called: Date (=the questionnaire date of a patient), and the goal of the procedure is to attach a predefined time interval (TI) number to the specific Date. The difficulty is that multiple patient groups exist: 21 to be exact (see P.S. below). Within these patient groups, the dates do not overlap, but if you don't take the patient groups into account, they do, resulting in an error related to the non-unique character of the merge variables.
The master dataset is named: AIRFORCE_C_E_Base.dta with the variables ID (patient ID), Date (questionnaire date), patient characteristics (most importantly Allergen and randomization year) and a lot of cost and effects variables associated with this specific date.
The lookup dataset is names: TI_dates.dta (where TI stands for time interval, with a start [TI_startdate ] and an end date[TI_enddate]. ) (TI ranges from 0 through MAXIMALLY 7, so potentially eight).
In 2011, Phil Schumm ([email protected]) responded to a similar case on STATAlist, but my situation is slightly more complicated due to the fact that the start and enddates of the lookup ranges are only unique WITHIN any of the 21 groups. The original post can be found here: http://www.stata.com/statalist/archive/2011-06/msg00627.html
I figured the simple method he described in the email using the cross command is not applicable here, therefore a two-step merge procedure is in order. He suggests:
use dataset1
merge 1:1 _n using dataset2, keepusing(start_x end_x) nogen
gen start = .
gen end = .
forv i=1/`c(N)' {
if mi(start_x[`i']) continue, break
replace start = start_x[`i'] if inrange(x,start_x[`i'],end_x[`i'])
replace end = end_x[`i'] if inrange(x,start_x[`i'],end_x[`i'])
}
drop start_x end_x
ren start start_x
ren end end_x
merge m:1 start_x end_x using dataset2
Is there a way (e.g. using a while-loop) to perform this procedure for any of the 21 different combinations of random_year and Allergen(_codes)? I figured that start_x corresponds to TI_startdate and end_x to TI_enddate. X must be the variable to be lookedup: Date.
However, running an adapted version of the syntax above for 21 times seems a bit unneccesary.
I'd be more than willing to elaborate. Also, I can send an anonymized version of the datasets. Your help is much appreciated.
Best,
Menno Kiel
Rotterdam, The Netherlands
P.S.
(Explanation of the patient groups: The start and enddates are different for any of the 3 randomization years (Random_year: 2009, 2010, 2011) and 7 Allergens (T, G, H, TG, TH, GH or TGH); this variable is stored as a string ('Allergen') or numerical ('Allergen_code') = 3*7= 21 theoretical combinations of Allergen and Random_year, each with different sets of lookup ranges. I have created a concatenated variable of allergen letter and randomization year (e.g. T2009 for a tree pollen patient randomized in 2009) in the master dataset for future use, reducing the number of key variable from 2 to one.)
______________________________________
Menno A. Kiel, MD MSc
research physician / health economist
Institute for Medical Technology Assessment
Dept. of Health Economics (GE-iMTA)
Erasmus University Rotterdam
Woudestein Campus, r. J5-69
PO Box 1738
50 Burg. Oudlaan, 3000DR Rotterdam
The Netherlands
W1: +31(0)10-4088885 (ma-do-vrij)
W2: +31(0)10-7945523 (di-woe)
F: +31(0)10-4089094
M: +31(0)6-81501516
E: [email protected]
I: www.imta.nl
Disclaimer
________________________________
De informatie verzonden in dit e-mail bericht is vertrouwelijk en is uitsluitend bestemd voor de geadresseerde van dit bericht. Lees verder: www.eur.nl/email-disclaimer <http://www.eur.nl/email-disclaimer>
The information in this e-mail message is confidential and may be legally privileged. Read more: www.eur.nl/english/email-disclaimer<http://www.eur.nl/english/email-disclaimer>
________________________________
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/