Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Menno Kiel <kiel@bmg.eur.nl> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: Merge command as date lookup tool |
Date | Wed, 16 Apr 2014 10:53:27 +0000 |
L.S., I have a question about using the merge command as a means of performing a date lookup to retrieve a value attached to the applicable date range. The variable that needs to be looked up is called: Date (=the questionnaire date of a patient), and the goal of the procedure is to attach a predefined time interval (TI) number to the specific Date. The difficulty is that multiple patient groups exist: 21 to be exact (see P.S. below). Within these patient groups, the dates do not overlap, but if you don't take the patient groups into account, they do, resulting in an error related to the non-unique character of the merge variables. The master dataset is named: AIRFORCE_C_E_Base.dta with the variables ID (patient ID), Date (questionnaire date), patient characteristics (most importantly Allergen and randomization year) and a lot of cost and effects variables associated with this specific date. The lookup dataset is names: TI_dates.dta (where TI stands for time interval, with a start [TI_startdate ] and an end date[TI_enddate]. ) (TI ranges from 0 through MAXIMALLY 7, so potentially eight). In 2011, Phil Schumm (pschumm@uchicago.edu) responded to a similar case on STATAlist, but my situation is slightly more complicated due to the fact that the start and enddates of the lookup ranges are only unique WITHIN any of the 21 groups. The original post can be found here: http://www.stata.com/statalist/archive/2011-06/msg00627.html I figured the simple method he described in the email using the cross command is not applicable here, therefore a two-step merge procedure is in order. He suggests: use dataset1 merge 1:1 _n using dataset2, keepusing(start_x end_x) nogen gen start = . gen end = . forv i=1/`c(N)' { if mi(start_x[`i']) continue, break replace start = start_x[`i'] if inrange(x,start_x[`i'],end_x[`i']) replace end = end_x[`i'] if inrange(x,start_x[`i'],end_x[`i']) } drop start_x end_x ren start start_x ren end end_x merge m:1 start_x end_x using dataset2 Is there a way (e.g. using a while-loop) to perform this procedure for any of the 21 different combinations of random_year and Allergen(_codes)? I figured that start_x corresponds to TI_startdate and end_x to TI_enddate. X must be the variable to be lookedup: Date. However, running an adapted version of the syntax above for 21 times seems a bit unneccesary. I'd be more than willing to elaborate. Also, I can send an anonymized version of the datasets. Your help is much appreciated. Best, Menno Kiel Rotterdam, The Netherlands P.S. (Explanation of the patient groups: The start and enddates are different for any of the 3 randomization years (Random_year: 2009, 2010, 2011) and 7 Allergens (T, G, H, TG, TH, GH or TGH); this variable is stored as a string ('Allergen') or numerical ('Allergen_code') = 3*7= 21 theoretical combinations of Allergen and Random_year, each with different sets of lookup ranges. I have created a concatenated variable of allergen letter and randomization year (e.g. T2009 for a tree pollen patient randomized in 2009) in the master dataset for future use, reducing the number of key variable from 2 to one.) ______________________________________ Menno A. Kiel, MD MSc research physician / health economist Institute for Medical Technology Assessment Dept. of Health Economics (GE-iMTA) Erasmus University Rotterdam Woudestein Campus, r. J5-69 PO Box 1738 50 Burg. Oudlaan, 3000DR Rotterdam The Netherlands W1: +31(0)10-4088885 (ma-do-vrij) W2: +31(0)10-7945523 (di-woe) F: +31(0)10-4089094 M: +31(0)6-81501516 E: kiel@bmg.eur.nl I: www.imta.nl Disclaimer ________________________________ De informatie verzonden in dit e-mail bericht is vertrouwelijk en is uitsluitend bestemd voor de geadresseerde van dit bericht. Lees verder: www.eur.nl/email-disclaimer <http://www.eur.nl/email-disclaimer> The information in this e-mail message is confidential and may be legally privileged. Read more: www.eur.nl/english/email-disclaimer<http://www.eur.nl/english/email-disclaimer> ________________________________ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/