Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: sequential subscript processing

From	Rebecca Pope <[email protected]>
To	[email protected]
Subject	st: sequential subscript processing
Date	Wed, 27 Mar 2013 09:04:16 -0500

This is a question about efficiency. The code I've written produces
the output I need; it just seems to me that it could be improved.

Rather than comparing obs[3] to obs[2], I need to compare obs[3] to
obs[2] _after_ obs[2] has been conditionally changed by the values in
obs[1]. For context, the goal is to "chain" prescription refills
together to calculate 180-day medication possession ratios. Everyone
in the data has at least one refill. For any of you who work with
MPRs, don't panic: this isn't the extent of the calculation or the
rules. I'm using "refill" loosely; it includes titrations. The goal
with this example was to capture the essential issue with the dates.

Definitions:
"dispensing date" - date the pharmacy provides the medication to the patient
"fill" - a distinct dispensing date+medication combination
"refill date" - when the medication is projected to be filled again
"days supply" - the number of days for which the prescription provides
medication (usually 30, 60, or 90)

The rules are:
1. If a patient's refill overlaps the previous fill by more than 20%
of the previous fill's days supply, replace the current observation's
dispensing date with the previous fill's dispensing date, adjust the
days supplied for the current observation to (days supplied(t-1) +
days supplied(t)) less the number of days of overlap. I.e. truncate
the previous fill's days supplied & assume use of the refill starts on
the day it is dispensed.

2. If a patient's refill overlaps the previous fill by <= 20% of the
previous fill's days supply, replace the current observation's
dispensing date with the previous fill's dispensing date, adjust the
days supplied for the current observation to (days supplied(t-1) +
days supplied(t)). I.e. shift dispensing date of refill to the end of
the previous fill.

I think I've got a good start on this with -forvalues- and -while-.
I've put a sample of the data below. As a note, this data has been
de-identifed before posting. The dates have been jittered from the
real dates, but I've replicated all of the major features. The
variable "ptdrugid" was created from -egen ptdrugid = group(ptid
shortnm)-.

** begin code **
clear
input    ptdrugid   _dispdt   daysuppl
          14     18000         30
          14     18031         30
          14     18128         30
          15     16877         30
          15     16903         30
          15     16952         30
          15     16987         30
          15     17010         30
          15     17047         30
          15     17073         30
          15     17093         30
          15     17132         30
          15     17165         30
          15     17194         30
          15     17224         30
          15     17249         30
          15     17286         30
          15     17327         30
          15     17357         30
          15     17385         30
          15     17413         30
          15     17445         30
          15     17474         30
          15     17500         30
          15     17534         30
          15     17568         30
          15     17597         30
          15     17620         30
          15     17645         30
          15     17669         30
          15     17702         30
          15     17728         30
          15     17758         30
          15     17796         30
          15     17818         30
          15     17861         30
          15     17898         30
          15     17934         30
          15     17934         10
          15     17952         30
          15     17971         30
          15     18002         30
          15     18032         30
          15     18075         30
          15     18096         30
          15     18107         90
          15     18190         90
end
gen _refilldt = _dispdt+daysuppl-1
format _dispdt _refilldt %td
clonevar dispdt = _dispdt
clonevar refilldt = _refilldt
bys ptdrugid (_dispdt _refilldt): gen _seq = _n
sum _seq, meanonly
local nmax = `r(max)'
gen chng = 0
clonevar mdaysup = daysuppl
forvalues j = 2/`nmax' {
by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j'
by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
 (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) >
0.2*mdaysup[_n-1]) if chng
by ptdrugid: replace dispdt = dispdt[_n-1] if chng
replace refilldt = dispdt + mdaysup - 1
by ptdrugid: drop if chng[_n+1]==1
by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j'
sum chng, meanonly
if `r(sum)' > 0 {
 local x 1
 while `x' > 0 {
  by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
   (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) >
0.2*mdaysup[_n-1]) ///
   if chng
  by ptdrugid: replace dispdt = dispdt[_n-1] if chng
  replace refilldt = dispdt + mdaysup - 1
  by ptdrugid: drop if chng[_n+1]==1
  by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j'
  sum chng, meanonly
  local x  = `r(sum)'
}
}
}
exit
** end code **

To my way of thinking, this is horribly inefficient. Among the issues
that are immediately apparent to me: (1) once `nmax' has been set, it
isn't altered despite the fact that the number of observations winds
up being fall smaller as fills are chained (too many attempts at the
loop) and (2) I continue making loops over observations once they've
been maximally condensed.

Does anyone have any suggestions for making this code better?

Thanks,
Rebecca
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: sequential subscript processing
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Re:using a flag variable for missing values in a regression
Next by Date: st: How to put max and min values in a loop
Previous by thread: st: Re:using a flag variable for missing values in a regression
Next by thread: Re: st: sequential subscript processing
Index(es):
- Date
- Thread