Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Teubner, David (Health)" <david.teubner@health.sa.gov.au> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Equivalent of Excel's COUNTIF |
Date | Thu, 17 Nov 2011 10:59:01 +1030 |
Thank-you everyone for your help - fantastic! David Teubner On 17/11/2011, at 6:45 AM, Robert Picard wrote: > You can avoid the -reshape- overhead by duplicating each patient > record directly using -expand-. Also, there is no need to do another > -reshape- to get back to the original data form, you can simply drop > the extra record. Here's a reworked example using Stata datetime > internal format: > > *----------- begin example ------------- > > clear all > set seed 10101 > > * the number of milliseconds in a day > scalar mday = 24 * 60 * 60 * 1000 > > * create patient records > set obs 10000 > gen double timein = clock("2011 1 1","YMD") + runiform()*180*mday > gen double timeout = timein + runiform()*10*mday > format timein timeout %tc > gen id = _n > > * duplicate observations and order by event time > expand 2 > sort id > by id: gen double etime = cond(_n==1,timein,timeout) > by id: gen event = cond(_n==1,1,-1) > sort etime id > > * the population goes up or down > gen pop = sum(event) - 1 > > * drop the extra record > drop if event == -1 > drop etime event > > *------------ end example -------------- > > > > On Wed, Nov 16, 2011 at 2:30 PM, Richard Herron > <richard.c.herron@gmail.com> wrote: >> The -reshape- solution is faster than the loop solution (timer 1 vs >> timer 2 below). With 1e5 individuals the loop solution was beyond my >> patience. >> >> timer list >> 1: 0.12 / 1 = 0.1240 >> 2: 13.42 / 1 = 13.4210 >> >> I had to modify our solutions a little. >> >> * begin code >> timer clear >> timer on 1 >> clear >> set obs 10000 >> set seed 10101 >> generate long id = _n >> generate datein = runiform()*5000 >> generate dateout = datein + runiform()*15 >> >> reshape long date, i(id) j(inout) string >> sort date >> tempvar change >> generate int `change' = cond(inout == "in", 1, -1) >> generate int total = sum(`change') - 1 >> timer off 1 >> timer list >> >> timer on 2 >> clear >> set obs 10000 >> set seed 10101 >> generate arrival = runiform()*5000 >> generate discharge = arrival + runiform()*15 >> gen long npatients = . >> gen long _num_discharged = . >> sort arrival >> forvalues k=1/`=_N' { >> quietly replace _num_discharged = sum( discharge <= arrival[`k'] ) >> quietly replace npatients = (_n-1) - _num_discharged[`k'] in `k' >> } >> timer off 2 >> >> timer list >> * end code >> >> On Wed, Nov 16, 2011 at 11:44, Stas Kolenikov <skolenik@gmail.com> wrote: >>> On Wed, Nov 16, 2011 at 11:04 AM, Richard Herron >>> <richard.c.herron@gmail.com> wrote: >>>> Here is an alternative solution with -reshape-, -cond-, and -sum-. >>> >>> Cute solution! >>> >>>> The last two functions should be fast at any scale, but I don't have >>>> enough experience with Stata to know if -reshape- is faster than a >>>> loop. >>> >>> That's easy to check: set obs 10M instead of 10, and see what will be >>> faster (and whether the -reshape- will start breaking down with large >>> data sets; it might or it might not). -reshape- appears to be using a >>> lot of I/O with explicit -use-, -save- and -merge- in the code; I >>> thought this would have been written in C or Mata -- there's a >>> reshape() function in Mata. >>> >>> -- >>> Stas Kolenikov, also found at http://stas.kolenikov.name >>> Small print: I use this email account for mailing lists only. >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/