Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: median of consecutive groups - avoiding loops
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: median of consecutive groups - avoiding loops
Date
Thu, 12 May 2011 08:47:46 +0100
Interesting. There are lots of side issues here. Here are a few:
1. That the loop solution is not especially fast is not primarily
because it is a loop, I guess. The code uses -summarize, detail- to
get the median because -summarize- doesn't, but that causes lots of
wasted effort in also calculating things like variance, skewness and
kurtosis which are not of interest here.
2. -egen- usually slows things down because many command lines that
often do not do much need to be interpreted. But -egen-'s -median()-
function is a smart one and gets the median value directly from sorted
data and Stata is very fast at sorting.
The problem sounds so odd that I am not tempted to work much more at it!
Nick
On Thu, May 12, 2011 at 1:23 AM, daniel klein
<[email protected]> wrote:
> This question is indeed interesting. Ad hoc simulation shows, that the
> answer seems to depend on the number of groups. While the loop
> performs well, if the number of groups is small (10), it slows
> considerably down if number of groups increase (100). The speed of the
> "egen" solution does not seem to depend on number of groups (all runs
> with N=10,000). Guess Stata did a good job writing the -by- prefix.
> Simulations have equal group sizes. Overall it seems "egen solution"
> outperforms the loop.
>
> Would be interesting if one could speed things up using Mata (as I
> would expect). But then again, I guess in "real life" the differences
> will not matter much.
>
> Here's the simulation (syntax is -ahsim obs number_of_groups-).
>
> cap prog drop ahsim
>
> prog ahsim
> args obs ngroups
> if "`obs'" == "" loc obs 10000
> if "`ngroups'" == "" loc ngroups 10
> clear all
> qui {
> set obs `ngroups'
> g group = _n
> expand `obs'/`ngroups'
> sort group
> g value = rnormal()
> }
> di _n "{txt}Groups: `groups'"
> di "{txt}Obs." _N
>
> timer clear
>
> timer on 1
> su group, meanonly
> local last = r(max) - 1
>
> qui gen mymedian = .
>
> qui forval i = 1/`last' {
> local j = `i' + 1
> su value if inlist(group, `i', `j') , detail
> replace mymedian = r(p50) if group == `i'
> }
> timer off 1
>
> timer on 2
> g int newgroup1 = cond(mod(group, 2), group, group-1)
> g int newgroup2 = cond(mod(group, 2), group-1, group)
> bys newgroup1 : egen med1 = median(value)
> bys newgroup2 : egen med2 = median(value)
> g median = cond(mod(group, 2), med1, med2)
> drop newgroup1 newgroup2 med1 med2
> timer off 2
>
> timer list
>
> di _n "{txt}1: loop"
> di "{txt}2: egen"
> end
>
>
>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/