Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sequential subscript processing

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: sequential subscript processing
Date	Thu, 28 Mar 2013 01:02:18 +0000

I've looked through this code. My only strategic suggestion is that it
might be simplified if you had a single loop over observations and
(naturally) had an inbuilt check that each observation referred to the
same id as the previous. But then a single loop over observations can
be notoriously slow and you are trying to avoid that.

That is, your problem seemed similar in some ways to those discussed in

SJ-7-3  pr0033  . . . . . . . . . . . . . .  Stata tip 51: Events in intervals
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q3/07   SJ 7(3):440--443                                 (no commands)
        tip for counting or summarizing irregularly spaced
        events in intervals



On Wed, Mar 27, 2013 at 2:04 PM, Rebecca Pope <[email protected]> wrote:
> This is a question about efficiency. The code I've written produces
> the output I need; it just seems to me that it could be improved.
>
> Rather than comparing obs[3] to obs[2], I need to compare obs[3] to
> obs[2] _after_ obs[2] has been conditionally changed by the values in
> obs[1]. For context, the goal is to "chain" prescription refills
> together to calculate 180-day medication possession ratios. Everyone
> in the data has at least one refill. For any of you who work with
> MPRs, don't panic: this isn't the extent of the calculation or the
> rules. I'm using "refill" loosely; it includes titrations. The goal
> with this example was to capture the essential issue with the dates.
>
> Definitions:
> "dispensing date" - date the pharmacy provides the medication to the patient
> "fill" - a distinct dispensing date+medication combination
> "refill date" - when the medication is projected to be filled again
> "days supply" - the number of days for which the prescription provides
> medication (usually 30, 60, or 90)
>
> The rules are:
> 1. If a patient's refill overlaps the previous fill by more than 20%
> of the previous fill's days supply, replace the current observation's
> dispensing date with the previous fill's dispensing date, adjust the
> days supplied for the current observation to (days supplied(t-1) +
> days supplied(t)) less the number of days of overlap. I.e. truncate
> the previous fill's days supplied & assume use of the refill starts on
> the day it is dispensed.
>
> 2. If a patient's refill overlaps the previous fill by <= 20% of the
> previous fill's days supply, replace the current observation's
> dispensing date with the previous fill's dispensing date, adjust the
> days supplied for the current observation to (days supplied(t-1) +
> days supplied(t)). I.e. shift dispensing date of refill to the end of
> the previous fill.
>
> I think I've got a good start on this with -forvalues- and -while-.
> I've put a sample of the data below. As a note, this data has been
> de-identifed before posting. The dates have been jittered from the
> real dates, but I've replicated all of the major features. The
> variable "ptdrugid" was created from -egen ptdrugid = group(ptid
> shortnm)-.
>
> ** begin code **
> clear
> input    ptdrugid   _dispdt   daysuppl
>           14     18000         30
>           14     18031         30
>           14     18128         30
>           15     16877         30
>           15     16903         30
>           15     16952         30
>           15     16987         30
>           15     17010         30
>           15     17047         30
>           15     17073         30
>           15     17093         30
>           15     17132         30
>           15     17165         30
>           15     17194         30
>           15     17224         30
>           15     17249         30
>           15     17286         30
>           15     17327         30
>           15     17357         30
>           15     17385         30
>           15     17413         30
>           15     17445         30
>           15     17474         30
>           15     17500         30
>           15     17534         30
>           15     17568         30
>           15     17597         30
>           15     17620         30
>           15     17645         30
>           15     17669         30
>           15     17702         30
>           15     17728         30
>           15     17758         30
>           15     17796         30
>           15     17818         30
>           15     17861         30
>           15     17898         30
>           15     17934         30
>           15     17934         10
>           15     17952         30
>           15     17971         30
>           15     18002         30
>           15     18032         30
>           15     18075         30
>           15     18096         30
>           15     18107         90
>           15     18190         90
> end
> gen _refilldt = _dispdt+daysuppl-1
> format _dispdt _refilldt %td
> clonevar dispdt = _dispdt
> clonevar refilldt = _refilldt
> bys ptdrugid (_dispdt _refilldt): gen _seq = _n
> sum _seq, meanonly
> local nmax = `r(max)'
> gen chng = 0
> clonevar mdaysup = daysuppl
> forvalues j = 2/`nmax' {
> by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j'
> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
>  (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) >
> 0.2*mdaysup[_n-1]) if chng
> by ptdrugid: replace dispdt = dispdt[_n-1] if chng
> replace refilldt = dispdt + mdaysup - 1
> by ptdrugid: drop if chng[_n+1]==1
> by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j'
> sum chng, meanonly
> if `r(sum)' > 0 {
>  local x 1
>  while `x' > 0 {
>   by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
>    (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) >
> 0.2*mdaysup[_n-1]) ///
>    if chng
>   by ptdrugid: replace dispdt = dispdt[_n-1] if chng
>   replace refilldt = dispdt + mdaysup - 1
>   by ptdrugid: drop if chng[_n+1]==1
>   by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j'
>   sum chng, meanonly
>   local x  = `r(sum)'
> }
> }
> }
> exit
> ** end code **
>
> To my way of thinking, this is horribly inefficient. Among the issues
> that are immediately apparent to me: (1) once `nmax' has been set, it
> isn't altered despite the fact that the number of observations winds
> up being fall smaller as fills are chained (too many attempts at the
> loop) and (2) I continue making loops over observations once they've
> been maximally condensed.
>
> Does anyone have any suggestions for making this code better?
>
> Thanks,
> Rebecca
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: sequential subscript processing
  - From: Rebecca Pope <[email protected]>

References:
- st: sequential subscript processing
  - From: Rebecca Pope <[email protected]>

Prev by Date: Re: st: How to put max and min values in a loop
Next by Date: st: generate age variable from year and month of birth and date of the survey
Previous by thread: st: sequential subscript processing
Next by thread: Re: st: sequential subscript processing
Index(es):
- Date
- Thread