Thank you Nick! As always your advice is sound. I found your example and the
faq to be extremely helpful.
I ran through the logic and realized that I would probably stop after the
"sort id event" since at that point I can see when the first fill was
completed. The only other thing I would modify here is to generate a new
variable in which I would move up the fill date to the row of the last
visit.
Thanks again!
Ariel
Date: Tue, 20 Oct 2009 21:55:43 +0100
From: "Nick Cox" <[email protected]>
Subject: st: RE: RE: merging values of one variable to to the nearest value
in another dataset (stata v10.1)
A better way to fill in next visits is to reverse time. See much more
detail at http://www.stata.com/support/faqs/data/missing.html
Nick
[email protected]
- -----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: 20 October 2009 21:35
To: [email protected]
Subject: st: RE: merging values of one variable to to the nearest value
in another dataset (stata v10.1)
There are various answers to this question. One is to think in terms of
- -append- rather than -merge-.
Consider a strategy starting with appended files:
. l
+----------------------------+
| id dateofv~t dateoff~l |
|----------------------------|
1. | 1 1/1/2009 |
2. | 1 1/31/2009 |
3. | 1 2/18/2009 |
4. | 1 3/14/2009 |
5. | 1 5/30/2009 |
|----------------------------|
6. | 1 7/12/2009 |
7. | 2 3/10/2009 |
8. | 2 4/25/2009 |
9. | 2 5/20/2009 |
10. | 3 2/09/2009 |
|----------------------------|
11. | 3 6/08/2009 |
12. | 3 7/10/2009 |
13. | 1 2/22/2009 |
14. | 2 4/15/2009 |
15. | 3 7/15/2009 |
+----------------------------+
We first convert our date variables to numeric dates:
. gen visit = date(dateofvisit, "MDY")
(3 missing values generated)
. gen fill = date(dateoffill, "MDY")
(12 missing values generated)
Then combine our dates rowwise:
. gen event = max(visit, fill)
- -- and get data on next and previous visits
. sort id event
. gen lastvisit = visit
(3 missing values generated)
. by id : replace lastvisit = lastvisit[_n-1] if missing(lastvisit)
(3 real changes made)
. gen nextvisit = visit
(3 missing values generated)
. by id : replace nextvisit = nextvisit[_n+1] if missing(nextvisit)
(2 real changes made)
- -- and so on and so forth. So for each visit you can find the nearest
fill, and vice versa -- and keep track of the separation involved.
Nick
[email protected]
Ariel Linden
I have a dataset in long format which has dates when an individual
visited a
doctor. I have another dataset with dates when that patient started
taking a
medication. Naturally, the dates will not align exactly between the two
data
sets.
Is there a way of merging the files so that the date when the patient
started taking the medication is aligned (put on the same line) to the
closest date (but after) of a given doctor visit?
Example
File 1 - office visits
ID date of visit
1 1/1/2009
1 1/31/2009
1 2/18/2009
1 3/14/2009
1 5/30/2009
1 7/12/2009
2 3/10/2009
2 4/25/2009
2 5/20/2009
3 2/09/2009
3 6/08/2009
3 7/10/2009
File 2 - medication fill date
ID date of first fill
1 2/22/2009
2 4/15/2009
3 7/15/2009
If I would like the medication date to be added to the office visit file
to
be matched to the last closest office visit, then ID 1 medication date
should align with 1/31/2009, ID 2 should align with 3/10/2009 and ID 3
should align with 7/10/2009.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/