Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sequential subscript processing

From	Rebecca Pope <[email protected]>
To	[email protected]
Subject	Re: st: sequential subscript processing
Date	Fri, 29 Mar 2013 09:54:22 -0500

Nick,
Thanks for the additional information about single loops. I read the
tip and rewrote the code using that approach. You are right about it
running slower. I will note one advantage though: just looking at the
two, I think the logic of the single loop code is easier to follow.

Regards,
Rebecca



On Wed, Mar 27, 2013 at 8:02 PM, Nick Cox <[email protected]> wrote:
> I've looked through this code. My only strategic suggestion is that it
> might be simplified if you had a single loop over observations and
> (naturally) had an inbuilt check that each observation referred to the
> same id as the previous. But then a single loop over observations can
> be notoriously slow and you are trying to avoid that.
>
> That is, your problem seemed similar in some ways to those discussed in
>
> SJ-7-3  pr0033  . . . . . . . . . . . . . .  Stata tip 51: Events in intervals
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         Q3/07   SJ 7(3):440--443                                 (no commands)
>         tip for counting or summarizing irregularly spaced
>         events in intervals
>
>
>
> On Wed, Mar 27, 2013 at 2:04 PM, Rebecca Pope <[email protected]> wrote:
>> This is a question about efficiency. The code I've written produces
>> the output I need; it just seems to me that it could be improved.
>>
>> Rather than comparing obs[3] to obs[2], I need to compare obs[3] to
>> obs[2] _after_ obs[2] has been conditionally changed by the values in
>> obs[1]. For context, the goal is to "chain" prescription refills
>> together to calculate 180-day medication possession ratios. Everyone
>> in the data has at least one refill. For any of you who work with
>> MPRs, don't panic: this isn't the extent of the calculation or the
>> rules. I'm using "refill" loosely; it includes titrations. The goal
>> with this example was to capture the essential issue with the dates.
>>
>> Definitions:
>> "dispensing date" - date the pharmacy provides the medication to the patient
>> "fill" - a distinct dispensing date+medication combination
>> "refill date" - when the medication is projected to be filled again
>> "days supply" - the number of days for which the prescription provides
>> medication (usually 30, 60, or 90)
>>
>> The rules are:
>> 1. If a patient's refill overlaps the previous fill by more than 20%
>> of the previous fill's days supply, replace the current observation's
>> dispensing date with the previous fill's dispensing date, adjust the
>> days supplied for the current observation to (days supplied(t-1) +
>> days supplied(t)) less the number of days of overlap. I.e. truncate
>> the previous fill's days supplied & assume use of the refill starts on
>> the day it is dispensed.
>>
>> 2. If a patient's refill overlaps the previous fill by <= 20% of the
>> previous fill's days supply, replace the current observation's
>> dispensing date with the previous fill's dispensing date, adjust the
>> days supplied for the current observation to (days supplied(t-1) +
>> days supplied(t)). I.e. shift dispensing date of refill to the end of
>> the previous fill.
>>
>> I think I've got a good start on this with -forvalues- and -while-.
>> I've put a sample of the data below. As a note, this data has been
>> de-identifed before posting. The dates have been jittered from the
>> real dates, but I've replicated all of the major features. The
>> variable "ptdrugid" was created from -egen ptdrugid = group(ptid
>> shortnm)-.
>>
>> ** begin code **
>> clear
>> input    ptdrugid   _dispdt   daysuppl
>>           14     18000         30
>>           14     18031         30
>>           14     18128         30
>>           15     16877         30
>>           15     16903         30
>>           15     16952         30
>>           15     16987         30
>>           15     17010         30
>>           15     17047         30
>>           15     17073         30
>>           15     17093         30
>>           15     17132         30
>>           15     17165         30
>>           15     17194         30
>>           15     17224         30
>>           15     17249         30
>>           15     17286         30
>>           15     17327         30
>>           15     17357         30
>>           15     17385         30
>>           15     17413         30
>>           15     17445         30
>>           15     17474         30
>>           15     17500         30
>>           15     17534         30
>>           15     17568         30
>>           15     17597         30
>>           15     17620         30
>>           15     17645         30
>>           15     17669         30
>>           15     17702         30
>>           15     17728         30
>>           15     17758         30
>>           15     17796         30
>>           15     17818         30
>>           15     17861         30
>>           15     17898         30
>>           15     17934         30
>>           15     17934         10
>>           15     17952         30
>>           15     17971         30
>>           15     18002         30
>>           15     18032         30
>>           15     18075         30
>>           15     18096         30
>>           15     18107         90
>>           15     18190         90
>> end
>> gen _refilldt = _dispdt+daysuppl-1
>> format _dispdt _refilldt %td
>> clonevar dispdt = _dispdt
>> clonevar refilldt = _refilldt
>> bys ptdrugid (_dispdt _refilldt): gen _seq = _n
>> sum _seq, meanonly
>> local nmax = `r(max)'
>> gen chng = 0
>> clonevar mdaysup = daysuppl
>> forvalues j = 2/`nmax' {
>> by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j'
>> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
>>  (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) >
>> 0.2*mdaysup[_n-1]) if chng
>> by ptdrugid: replace dispdt = dispdt[_n-1] if chng
>> replace refilldt = dispdt + mdaysup - 1
>> by ptdrugid: drop if chng[_n+1]==1
>> by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j'
>> sum chng, meanonly
>> if `r(sum)' > 0 {
>>  local x 1
>>  while `x' > 0 {
>>   by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
>>    (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) >
>> 0.2*mdaysup[_n-1]) ///
>>    if chng
>>   by ptdrugid: replace dispdt = dispdt[_n-1] if chng
>>   replace refilldt = dispdt + mdaysup - 1
>>   by ptdrugid: drop if chng[_n+1]==1
>>   by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j'
>>   sum chng, meanonly
>>   local x  = `r(sum)'
>> }
>> }
>> }
>> exit
>> ** end code **
>>
>> To my way of thinking, this is horribly inefficient. Among the issues
>> that are immediately apparent to me: (1) once `nmax' has been set, it
>> isn't altered despite the fact that the number of observations winds
>> up being fall smaller as fills are chained (too many attempts at the
>> loop) and (2) I continue making loops over observations once they've
>> been maximally condensed.
>>
>> Does anyone have any suggestions for making this code better?
>>
>> Thanks,
>> Rebecca
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: sequential subscript processing
  - From: Rebecca Pope <[email protected]>
- Re: st: sequential subscript processing
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Merging Time-Invariant Characteristics Into A Panel Dataset
Next by Date: st: Subtracting in Variable by a Group
Previous by thread: Re: st: sequential subscript processing
Next by thread: st: How to put max and min values in a loop
Index(es):
- Date
- Thread