Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Handling pharmacy data with multiple entries per subject
From
Phil Schumm <[email protected]>
To
[email protected]
Subject
Re: st: Handling pharmacy data with multiple entries per subject
Date
Mon, 13 Jun 2011 11:15:25 -0500
On Jun 10, 2011, at 5:05 PM, Doernberg, Sarah wrote:
Thank you for your responses. I'm attempting to be more specific
about my question below. My data is currently formatted as follows
(with an example entry):
ID drug_name start_date stop_date outcome
1 ceftriaxone 5/15/2001 5/17/2001 5/31/2001
1 ceftriaxone 5/19/2001 5/20/2001 5/31/2001
1 ceftriaxone 5/20/2001 5/24/2001 5/31/2001
1 ceftriaxone 7/24/2001 7/27/2001 .
This one person had 3 prescriptions for ceftriaxone during a
hospitalization in May, 2001, including one day where the person was
not given this drug (5/15-5/17 and 5/19-5/24). There was another
hospitalization in July, 2001, where another prescription was given.
The patient did not experience the outcome during the second
hospitalization. The dataset only contains information about two
different drugs.
My ultimate goal is to figure out the number of days each person
received each drug during a 30-day period from the first day of
receipt or before the date of the outcome (if <30 days from the
start of antibiotic) to allow for a logistic regression with
exposure = antibiotic days. In addition, I may also do a survival
analysis using start of antibiotic as the start date, development of
the outcome as failure, and censoring at 30 days in those without
the outcome.
The first thing to do with a dataset like this is to run some checks
to get a sense of what you're dealing with. For example (assuming
that start_date, stop_date and outcome are all Stata-format dates):
ass start_date <= stop_date
isid id drug_name start_date
bys id drug_name (start_date): ass stop_date[_n-1] <= start_date
if 1<_n
Taken together, these will verify that your start/stop dates are
properly ordered, and that the periods of administration for a given
drug do not overlap. There may be other things that you'll want to
check too. Whether you "fix" any problems that show up (or ask for an
updated data file) first or attempt to code around them will depend on
the situation.
Often with problems like this, generating summaries is easier (or at
least more readable) if you transform the data so that you have one
record for each day. You may not be able to do this if you have a
really big dataset, but don't be afraid to use your RAM if you've got
it. For example, the following code will expand out your periods so
that you have one record for each day of drug administration
(separately by drug):
expand stop_date - start_date + 1
bys id drug_name start_date: gen date = start_date if _n==1
bys id drug_name start_date (date): replace date = date[_n-1] + 1
if 1<_n
Now, you probably want a record corresponding to the date of the
outcome (if you don't already have one), which you can generate (for a
single drug, using the data in your example) with:
gen fail = (outcome == date)
bys id outcome (fail): gen byte n = (_n==1 & !fail[_N] & !
mi(outcome)) + 1
expand n, gen(added)
replace date = outcome if added
replace drug_name = "" if added
replace fail = 1 if added
Finally, you may need to clean up any duplicates that were generated
above:
keep id drug_name date fail
duplicates drop
isid id date
From the resulting dataset, you can easily get what you need for a
survival analysis (using cumulative drug exposure as a time-varying
covariate), or generate the 30-day periods you describe. You'll
probably need information on the dates of hospital admission and
discharge to do this.
Note that for simplicity, I have essentially ignored the fact that you
have multiple drugs here (except for the obvious stratification by
drug_name in the first two code blocks). Ultimately, you'll probably
want to have one record per date on which *any* drug was administered
(plus additional records, if necessary, for dates of hospital
admission and discharge, and date of outcome), with a separate 0/1
column for each drug indicating whether that drug was administered on
that day. One way to achieve this would be to do what I have shown
above separately for each of your two drugs, and then merge the
results together (by id and date). The outcome data (as you have
shown them) are in an odd format, and how you handle them ultimately
will depend on what they look like for the other drug (i.e., are the
same outcome dates repeated)? It may well be easiest just to peel
those off and deal with them separately (like the two drugs), and then
merge the outcome dates back on at the end.
Hopefully this'll give you some ideas/strategies that you can use.
-- Phil
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/