Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: sequential subscript processing
From
Rebecca Pope <[email protected]>
To
[email protected]
Subject
Re: st: sequential subscript processing
Date
Fri, 29 Mar 2013 09:54:22 -0500
Nick,
Thanks for the additional information about single loops. I read the
tip and rewrote the code using that approach. You are right about it
running slower. I will note one advantage though: just looking at the
two, I think the logic of the single loop code is easier to follow.
Regards,
Rebecca
On Wed, Mar 27, 2013 at 8:02 PM, Nick Cox <[email protected]> wrote:
> I've looked through this code. My only strategic suggestion is that it
> might be simplified if you had a single loop over observations and
> (naturally) had an inbuilt check that each observation referred to the
> same id as the previous. But then a single loop over observations can
> be notoriously slow and you are trying to avoid that.
>
> That is, your problem seemed similar in some ways to those discussed in
>
> SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in intervals
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> Q3/07 SJ 7(3):440--443 (no commands)
> tip for counting or summarizing irregularly spaced
> events in intervals
>
>
>
> On Wed, Mar 27, 2013 at 2:04 PM, Rebecca Pope <[email protected]> wrote:
>> This is a question about efficiency. The code I've written produces
>> the output I need; it just seems to me that it could be improved.
>>
>> Rather than comparing obs[3] to obs[2], I need to compare obs[3] to
>> obs[2] _after_ obs[2] has been conditionally changed by the values in
>> obs[1]. For context, the goal is to "chain" prescription refills
>> together to calculate 180-day medication possession ratios. Everyone
>> in the data has at least one refill. For any of you who work with
>> MPRs, don't panic: this isn't the extent of the calculation or the
>> rules. I'm using "refill" loosely; it includes titrations. The goal
>> with this example was to capture the essential issue with the dates.
>>
>> Definitions:
>> "dispensing date" - date the pharmacy provides the medication to the patient
>> "fill" - a distinct dispensing date+medication combination
>> "refill date" - when the medication is projected to be filled again
>> "days supply" - the number of days for which the prescription provides
>> medication (usually 30, 60, or 90)
>>
>> The rules are:
>> 1. If a patient's refill overlaps the previous fill by more than 20%
>> of the previous fill's days supply, replace the current observation's
>> dispensing date with the previous fill's dispensing date, adjust the
>> days supplied for the current observation to (days supplied(t-1) +
>> days supplied(t)) less the number of days of overlap. I.e. truncate
>> the previous fill's days supplied & assume use of the refill starts on
>> the day it is dispensed.
>>
>> 2. If a patient's refill overlaps the previous fill by <= 20% of the
>> previous fill's days supply, replace the current observation's
>> dispensing date with the previous fill's dispensing date, adjust the
>> days supplied for the current observation to (days supplied(t-1) +
>> days supplied(t)). I.e. shift dispensing date of refill to the end of
>> the previous fill.
>>
>> I think I've got a good start on this with -forvalues- and -while-.
>> I've put a sample of the data below. As a note, this data has been
>> de-identifed before posting. The dates have been jittered from the
>> real dates, but I've replicated all of the major features. The
>> variable "ptdrugid" was created from -egen ptdrugid = group(ptid
>> shortnm)-.
>>
>> ** begin code **
>> clear
>> input ptdrugid _dispdt daysuppl
>> 14 18000 30
>> 14 18031 30
>> 14 18128 30
>> 15 16877 30
>> 15 16903 30
>> 15 16952 30
>> 15 16987 30
>> 15 17010 30
>> 15 17047 30
>> 15 17073 30
>> 15 17093 30
>> 15 17132 30
>> 15 17165 30
>> 15 17194 30
>> 15 17224 30
>> 15 17249 30
>> 15 17286 30
>> 15 17327 30
>> 15 17357 30
>> 15 17385 30
>> 15 17413 30
>> 15 17445 30
>> 15 17474 30
>> 15 17500 30
>> 15 17534 30
>> 15 17568 30
>> 15 17597 30
>> 15 17620 30
>> 15 17645 30
>> 15 17669 30
>> 15 17702 30
>> 15 17728 30
>> 15 17758 30
>> 15 17796 30
>> 15 17818 30
>> 15 17861 30
>> 15 17898 30
>> 15 17934 30
>> 15 17934 10
>> 15 17952 30
>> 15 17971 30
>> 15 18002 30
>> 15 18032 30
>> 15 18075 30
>> 15 18096 30
>> 15 18107 90
>> 15 18190 90
>> end
>> gen _refilldt = _dispdt+daysuppl-1
>> format _dispdt _refilldt %td
>> clonevar dispdt = _dispdt
>> clonevar refilldt = _refilldt
>> bys ptdrugid (_dispdt _refilldt): gen _seq = _n
>> sum _seq, meanonly
>> local nmax = `r(max)'
>> gen chng = 0
>> clonevar mdaysup = daysuppl
>> forvalues j = 2/`nmax' {
>> by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j'
>> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
>> (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) >
>> 0.2*mdaysup[_n-1]) if chng
>> by ptdrugid: replace dispdt = dispdt[_n-1] if chng
>> replace refilldt = dispdt + mdaysup - 1
>> by ptdrugid: drop if chng[_n+1]==1
>> by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j'
>> sum chng, meanonly
>> if `r(sum)' > 0 {
>> local x 1
>> while `x' > 0 {
>> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
>> (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) >
>> 0.2*mdaysup[_n-1]) ///
>> if chng
>> by ptdrugid: replace dispdt = dispdt[_n-1] if chng
>> replace refilldt = dispdt + mdaysup - 1
>> by ptdrugid: drop if chng[_n+1]==1
>> by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j'
>> sum chng, meanonly
>> local x = `r(sum)'
>> }
>> }
>> }
>> exit
>> ** end code **
>>
>> To my way of thinking, this is horribly inefficient. Among the issues
>> that are immediately apparent to me: (1) once `nmax' has been set, it
>> isn't altered despite the fact that the number of observations winds
>> up being fall smaller as fills are chained (too many attempts at the
>> loop) and (2) I continue making loops over observations once they've
>> been maximally condensed.
>>
>> Does anyone have any suggestions for making this code better?
>>
>> Thanks,
>> Rebecca
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/