Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Equivalent of Excel's COUNTIF
From
Richard Herron <[email protected]>
To
[email protected]
Subject
Re: st: Equivalent of Excel's COUNTIF
Date
Wed, 16 Nov 2011 14:30:39 -0500
The -reshape- solution is faster than the loop solution (timer 1 vs
timer 2 below). With 1e5 individuals the loop solution was beyond my
patience.
timer list
1: 0.12 / 1 = 0.1240
2: 13.42 / 1 = 13.4210
I had to modify our solutions a little.
* begin code
timer clear
timer on 1
clear
set obs 10000
set seed 10101
generate long id = _n
generate datein = runiform()*5000
generate dateout = datein + runiform()*15
reshape long date, i(id) j(inout) string
sort date
tempvar change
generate int `change' = cond(inout == "in", 1, -1)
generate int total = sum(`change') - 1
timer off 1
timer list
timer on 2
clear
set obs 10000
set seed 10101
generate arrival = runiform()*5000
generate discharge = arrival + runiform()*15
gen long npatients = .
gen long _num_discharged = .
sort arrival
forvalues k=1/`=_N' {
quietly replace _num_discharged = sum( discharge <= arrival[`k'] )
quietly replace npatients = (_n-1) - _num_discharged[`k'] in `k'
}
timer off 2
timer list
* end code
On Wed, Nov 16, 2011 at 11:44, Stas Kolenikov <[email protected]> wrote:
> On Wed, Nov 16, 2011 at 11:04 AM, Richard Herron
> <[email protected]> wrote:
>> Here is an alternative solution with -reshape-, -cond-, and -sum-.
>
> Cute solution!
>
>> The last two functions should be fast at any scale, but I don't have
>> enough experience with Stata to know if -reshape- is faster than a
>> loop.
>
> That's easy to check: set obs 10M instead of 10, and see what will be
> faster (and whether the -reshape- will start breaking down with large
> data sets; it might or it might not). -reshape- appears to be using a
> lot of I/O with explicit -use-, -save- and -merge- in the code; I
> thought this would have been written in C or Mata -- there's a
> reshape() function in Mata.
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/