Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: median of consecutive groups - avoiding loops
From
"Sarah Kristina Reuter" <[email protected]>
To
[email protected]
Subject
Re: st: median of consecutive groups - avoiding loops
Date
Thu, 12 May 2011 20:41:33 +0200
Daniel and Nick
Thank you so much for your discussion!
Daniel: your proposal works perfectly.
One last note for clarification: I wanted to avoid the loop because the data is huge. The egen-
command is much faster no matter what the reason is.
Sarah
Am 12 May 2011 um 8:47 hat Nick Cox geschrieben:
> Interesting. There are lots of side issues here. Here are a few:
>
> 1. That the loop solution is not especially fast is not primarily
> because it is a loop, I guess. The code uses -summarize, detail-
> to
> get the median because -summarize- doesn't, but that causes lots
> of
> wasted effort in also calculating things like variance, skewness
> and
> kurtosis which are not of interest here.
>
> 2. -egen- usually slows things down because many command lines
> that
> often do not do much need to be interpreted. But -egen-'s
> -median()-
> function is a smart one and gets the median value directly from
> sorted
> data and Stata is very fast at sorting.
>
> The problem sounds so odd that I am not tempted to work much more at
> it!
>
> Nick
>
> On Thu, May 12, 2011 at 1:23 AM, daniel klein
> <[email protected]> wrote:
> > This question is indeed interesting. Ad hoc simulation shows, that
> the
> > answer seems to depend on the number of groups. While the loop
> > performs well, if the number of groups is small (10), it slows
> > considerably down if number of groups increase (100). The speed of
> the
> > "egen" solution does not seem to depend on number of groups (all
> runs
> > with N=10,000). Guess Stata did a good job writing the -by-
> prefix.
> > Simulations have equal group sizes. Overall it seems "egen
> solution"
> > outperforms the loop.
> >
> > Would be interesting if one could speed things up using Mata (as
> I
> > would expect). But then again, I guess in "real life" the
> differences
> > will not matter much.
> >
> > Here's the simulation (syntax is -ahsim obs number_of_groups-).
> >
> > cap prog drop ahsim
> >
> > prog ahsim
> > args obs ngroups
> > if "`obs'" == "" loc obs 10000
> > if "`ngroups'" == "" loc ngroups 10
> > clear all
> > qui {
> > set obs `ngroups'
> > g group = _n
> > expand `obs'/`ngroups'
> > sort group
> > g value = rnormal()
> > }
> > di _n "{txt}Groups: `groups'"
> > di "{txt}Obs." _N
> >
> > timer clear
> >
> > timer on 1
> > su group, meanonly
> > local last = r(max) - 1
> >
> > qui gen mymedian = .
> >
> > qui forval i = 1/`last' {
> > local j = `i' + 1
> > su value if inlist(group, `i', `j') , detail
> > replace mymedian = r(p50) if group == `i'
> > }
> > timer off 1
> >
> > timer on 2
> > g int newgroup1 = cond(mod(group, 2), group, group-1)
> > g int newgroup2 = cond(mod(group, 2), group-1, group)
> > bys newgroup1 : egen med1 = median(value)
> > bys newgroup2 : egen med2 = median(value)
> > g median = cond(mod(group, 2), med1, med2)
> > drop newgroup1 newgroup2 med1 med2
> > timer off 2
> >
> > timer list
> >
> > di _n "{txt}1: loop"
> > di "{txt}2: egen"
> > end
> >
> >
> >
> >
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
Dipl.-Kffr. Sarah Reuter
Friedrich-Schiller-Universität Jena
Wirtschaftswissenschaftliche Fakultät
Lehrstuhl für Allgemeine Betriebswirtschaftslehre,
insbesondere Finanzierung, Banken und Risikomanagement
Carl-Zeiss-Str. 3
07743 Jena
Tel.: +49 (0)3641 9 43123
Fax: +49 (0)3641 9 43122
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/