Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: sequential subscript processing
From
Rebecca Pope <[email protected]>
To
[email protected]
Subject
st: sequential subscript processing
Date
Wed, 27 Mar 2013 09:04:16 -0500
This is a question about efficiency. The code I've written produces
the output I need; it just seems to me that it could be improved.
Rather than comparing obs[3] to obs[2], I need to compare obs[3] to
obs[2] _after_ obs[2] has been conditionally changed by the values in
obs[1]. For context, the goal is to "chain" prescription refills
together to calculate 180-day medication possession ratios. Everyone
in the data has at least one refill. For any of you who work with
MPRs, don't panic: this isn't the extent of the calculation or the
rules. I'm using "refill" loosely; it includes titrations. The goal
with this example was to capture the essential issue with the dates.
Definitions:
"dispensing date" - date the pharmacy provides the medication to the patient
"fill" - a distinct dispensing date+medication combination
"refill date" - when the medication is projected to be filled again
"days supply" - the number of days for which the prescription provides
medication (usually 30, 60, or 90)
The rules are:
1. If a patient's refill overlaps the previous fill by more than 20%
of the previous fill's days supply, replace the current observation's
dispensing date with the previous fill's dispensing date, adjust the
days supplied for the current observation to (days supplied(t-1) +
days supplied(t)) less the number of days of overlap. I.e. truncate
the previous fill's days supplied & assume use of the refill starts on
the day it is dispensed.
2. If a patient's refill overlaps the previous fill by <= 20% of the
previous fill's days supply, replace the current observation's
dispensing date with the previous fill's dispensing date, adjust the
days supplied for the current observation to (days supplied(t-1) +
days supplied(t)). I.e. shift dispensing date of refill to the end of
the previous fill.
I think I've got a good start on this with -forvalues- and -while-.
I've put a sample of the data below. As a note, this data has been
de-identifed before posting. The dates have been jittered from the
real dates, but I've replicated all of the major features. The
variable "ptdrugid" was created from -egen ptdrugid = group(ptid
shortnm)-.
** begin code **
clear
input ptdrugid _dispdt daysuppl
14 18000 30
14 18031 30
14 18128 30
15 16877 30
15 16903 30
15 16952 30
15 16987 30
15 17010 30
15 17047 30
15 17073 30
15 17093 30
15 17132 30
15 17165 30
15 17194 30
15 17224 30
15 17249 30
15 17286 30
15 17327 30
15 17357 30
15 17385 30
15 17413 30
15 17445 30
15 17474 30
15 17500 30
15 17534 30
15 17568 30
15 17597 30
15 17620 30
15 17645 30
15 17669 30
15 17702 30
15 17728 30
15 17758 30
15 17796 30
15 17818 30
15 17861 30
15 17898 30
15 17934 30
15 17934 10
15 17952 30
15 17971 30
15 18002 30
15 18032 30
15 18075 30
15 18096 30
15 18107 90
15 18190 90
end
gen _refilldt = _dispdt+daysuppl-1
format _dispdt _refilldt %td
clonevar dispdt = _dispdt
clonevar refilldt = _refilldt
bys ptdrugid (_dispdt _refilldt): gen _seq = _n
sum _seq, meanonly
local nmax = `r(max)'
gen chng = 0
clonevar mdaysup = daysuppl
forvalues j = 2/`nmax' {
by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j'
by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
(dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) >
0.2*mdaysup[_n-1]) if chng
by ptdrugid: replace dispdt = dispdt[_n-1] if chng
replace refilldt = dispdt + mdaysup - 1
by ptdrugid: drop if chng[_n+1]==1
by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j'
sum chng, meanonly
if `r(sum)' > 0 {
local x 1
while `x' > 0 {
by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
(dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) >
0.2*mdaysup[_n-1]) ///
if chng
by ptdrugid: replace dispdt = dispdt[_n-1] if chng
replace refilldt = dispdt + mdaysup - 1
by ptdrugid: drop if chng[_n+1]==1
by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j'
sum chng, meanonly
local x = `r(sum)'
}
}
}
exit
** end code **
To my way of thinking, this is horribly inefficient. Among the issues
that are immediately apparent to me: (1) once `nmax' has been set, it
isn't altered despite the fact that the number of observations winds
up being fall smaller as fills are chained (too many attempts at the
loop) and (2) I continue making loops over observations once they've
been maximally condensed.
Does anyone have any suggestions for making this code better?
Thanks,
Rebecca
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/