Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Equivalent of Excel's COUNTIF
From
"Teubner, David (Health)" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Equivalent of Excel's COUNTIF
Date
Thu, 17 Nov 2011 10:59:01 +1030
Thank-you everyone for your help - fantastic!
David Teubner
On 17/11/2011, at 6:45 AM, Robert Picard wrote:
> You can avoid the -reshape- overhead by duplicating each patient
> record directly using -expand-. Also, there is no need to do another
> -reshape- to get back to the original data form, you can simply drop
> the extra record. Here's a reworked example using Stata datetime
> internal format:
>
> *----------- begin example -------------
>
> clear all
> set seed 10101
>
> * the number of milliseconds in a day
> scalar mday = 24 * 60 * 60 * 1000
>
> * create patient records
> set obs 10000
> gen double timein = clock("2011 1 1","YMD") + runiform()*180*mday
> gen double timeout = timein + runiform()*10*mday
> format timein timeout %tc
> gen id = _n
>
> * duplicate observations and order by event time
> expand 2
> sort id
> by id: gen double etime = cond(_n==1,timein,timeout)
> by id: gen event = cond(_n==1,1,-1)
> sort etime id
>
> * the population goes up or down
> gen pop = sum(event) - 1
>
> * drop the extra record
> drop if event == -1
> drop etime event
>
> *------------ end example --------------
>
>
>
> On Wed, Nov 16, 2011 at 2:30 PM, Richard Herron
> <[email protected]> wrote:
>> The -reshape- solution is faster than the loop solution (timer 1 vs
>> timer 2 below). With 1e5 individuals the loop solution was beyond my
>> patience.
>>
>> timer list
>> 1: 0.12 / 1 = 0.1240
>> 2: 13.42 / 1 = 13.4210
>>
>> I had to modify our solutions a little.
>>
>> * begin code
>> timer clear
>> timer on 1
>> clear
>> set obs 10000
>> set seed 10101
>> generate long id = _n
>> generate datein = runiform()*5000
>> generate dateout = datein + runiform()*15
>>
>> reshape long date, i(id) j(inout) string
>> sort date
>> tempvar change
>> generate int `change' = cond(inout == "in", 1, -1)
>> generate int total = sum(`change') - 1
>> timer off 1
>> timer list
>>
>> timer on 2
>> clear
>> set obs 10000
>> set seed 10101
>> generate arrival = runiform()*5000
>> generate discharge = arrival + runiform()*15
>> gen long npatients = .
>> gen long _num_discharged = .
>> sort arrival
>> forvalues k=1/`=_N' {
>> quietly replace _num_discharged = sum( discharge <= arrival[`k'] )
>> quietly replace npatients = (_n-1) - _num_discharged[`k'] in `k'
>> }
>> timer off 2
>>
>> timer list
>> * end code
>>
>> On Wed, Nov 16, 2011 at 11:44, Stas Kolenikov <[email protected]> wrote:
>>> On Wed, Nov 16, 2011 at 11:04 AM, Richard Herron
>>> <[email protected]> wrote:
>>>> Here is an alternative solution with -reshape-, -cond-, and -sum-.
>>>
>>> Cute solution!
>>>
>>>> The last two functions should be fast at any scale, but I don't have
>>>> enough experience with Stata to know if -reshape- is faster than a
>>>> loop.
>>>
>>> That's easy to check: set obs 10M instead of 10, and see what will be
>>> faster (and whether the -reshape- will start breaking down with large
>>> data sets; it might or it might not). -reshape- appears to be using a
>>> lot of I/O with explicit -use-, -save- and -merge- in the code; I
>>> thought this would have been written in C or Mata -- there's a
>>> reshape() function in Mata.
>>>
>>> --
>>> Stas Kolenikov, also found at http://stas.kolenikov.name
>>> Small print: I use this email account for mailing lists only.
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/