Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: sequential subscript processing
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: sequential subscript processing
Date
Thu, 28 Mar 2013 01:02:18 +0000
I've looked through this code. My only strategic suggestion is that it
might be simplified if you had a single loop over observations and
(naturally) had an inbuilt check that each observation referred to the
same id as the previous. But then a single loop over observations can
be notoriously slow and you are trying to avoid that.
That is, your problem seemed similar in some ways to those discussed in
SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in intervals
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q3/07 SJ 7(3):440--443 (no commands)
tip for counting or summarizing irregularly spaced
events in intervals
On Wed, Mar 27, 2013 at 2:04 PM, Rebecca Pope <[email protected]> wrote:
> This is a question about efficiency. The code I've written produces
> the output I need; it just seems to me that it could be improved.
>
> Rather than comparing obs[3] to obs[2], I need to compare obs[3] to
> obs[2] _after_ obs[2] has been conditionally changed by the values in
> obs[1]. For context, the goal is to "chain" prescription refills
> together to calculate 180-day medication possession ratios. Everyone
> in the data has at least one refill. For any of you who work with
> MPRs, don't panic: this isn't the extent of the calculation or the
> rules. I'm using "refill" loosely; it includes titrations. The goal
> with this example was to capture the essential issue with the dates.
>
> Definitions:
> "dispensing date" - date the pharmacy provides the medication to the patient
> "fill" - a distinct dispensing date+medication combination
> "refill date" - when the medication is projected to be filled again
> "days supply" - the number of days for which the prescription provides
> medication (usually 30, 60, or 90)
>
> The rules are:
> 1. If a patient's refill overlaps the previous fill by more than 20%
> of the previous fill's days supply, replace the current observation's
> dispensing date with the previous fill's dispensing date, adjust the
> days supplied for the current observation to (days supplied(t-1) +
> days supplied(t)) less the number of days of overlap. I.e. truncate
> the previous fill's days supplied & assume use of the refill starts on
> the day it is dispensed.
>
> 2. If a patient's refill overlaps the previous fill by <= 20% of the
> previous fill's days supply, replace the current observation's
> dispensing date with the previous fill's dispensing date, adjust the
> days supplied for the current observation to (days supplied(t-1) +
> days supplied(t)). I.e. shift dispensing date of refill to the end of
> the previous fill.
>
> I think I've got a good start on this with -forvalues- and -while-.
> I've put a sample of the data below. As a note, this data has been
> de-identifed before posting. The dates have been jittered from the
> real dates, but I've replicated all of the major features. The
> variable "ptdrugid" was created from -egen ptdrugid = group(ptid
> shortnm)-.
>
> ** begin code **
> clear
> input ptdrugid _dispdt daysuppl
> 14 18000 30
> 14 18031 30
> 14 18128 30
> 15 16877 30
> 15 16903 30
> 15 16952 30
> 15 16987 30
> 15 17010 30
> 15 17047 30
> 15 17073 30
> 15 17093 30
> 15 17132 30
> 15 17165 30
> 15 17194 30
> 15 17224 30
> 15 17249 30
> 15 17286 30
> 15 17327 30
> 15 17357 30
> 15 17385 30
> 15 17413 30
> 15 17445 30
> 15 17474 30
> 15 17500 30
> 15 17534 30
> 15 17568 30
> 15 17597 30
> 15 17620 30
> 15 17645 30
> 15 17669 30
> 15 17702 30
> 15 17728 30
> 15 17758 30
> 15 17796 30
> 15 17818 30
> 15 17861 30
> 15 17898 30
> 15 17934 30
> 15 17934 10
> 15 17952 30
> 15 17971 30
> 15 18002 30
> 15 18032 30
> 15 18075 30
> 15 18096 30
> 15 18107 90
> 15 18190 90
> end
> gen _refilldt = _dispdt+daysuppl-1
> format _dispdt _refilldt %td
> clonevar dispdt = _dispdt
> clonevar refilldt = _refilldt
> bys ptdrugid (_dispdt _refilldt): gen _seq = _n
> sum _seq, meanonly
> local nmax = `r(max)'
> gen chng = 0
> clonevar mdaysup = daysuppl
> forvalues j = 2/`nmax' {
> by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j'
> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
> (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) >
> 0.2*mdaysup[_n-1]) if chng
> by ptdrugid: replace dispdt = dispdt[_n-1] if chng
> replace refilldt = dispdt + mdaysup - 1
> by ptdrugid: drop if chng[_n+1]==1
> by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j'
> sum chng, meanonly
> if `r(sum)' > 0 {
> local x 1
> while `x' > 0 {
> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
> (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) >
> 0.2*mdaysup[_n-1]) ///
> if chng
> by ptdrugid: replace dispdt = dispdt[_n-1] if chng
> replace refilldt = dispdt + mdaysup - 1
> by ptdrugid: drop if chng[_n+1]==1
> by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j'
> sum chng, meanonly
> local x = `r(sum)'
> }
> }
> }
> exit
> ** end code **
>
> To my way of thinking, this is horribly inefficient. Among the issues
> that are immediately apparent to me: (1) once `nmax' has been set, it
> isn't altered despite the fact that the number of observations winds
> up being fall smaller as fills are chained (too many attempts at the
> loop) and (2) I continue making loops over observations once they've
> been maximally condensed.
>
> Does anyone have any suggestions for making this code better?
>
> Thanks,
> Rebecca
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/