Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: faster xtiling
From
László Sándor <[email protected]>
To
[email protected]
Subject
Re: st: faster xtiling
Date
Fri, 7 Sep 2012 12:16:55 -0400
Daniel, Maarten, thank you very much.
To start with Maarten's point, I'm sad to say I don't see why Stata
would skip the sort that is in stile's Makequan subroutine. Maybe I
overlooked something. But this can be the source of Caskey's speedup
too?
Otherwise, Caskey's extensive use of -egen- would be very surprising
to beat a well-written -_pctile- in C-code. Strange.
And I am also worried about Caskey not getting exactly the same
number. -xtile- without the sort still would?
I am not in the business of accusing of StataCorp with sloppiness or
laziness. Maybe there is some arbitrariness in what _pctile needs to
do so to make it reproducible (with ties?), they need a preceding
sort? Not worth worrying about in my case, I think, so I'd just drop
the sort.
Thanks!
Laszlo
On Fri, Sep 7, 2012 at 11:55 AM, Daniel Brodback <[email protected]> wrote:
>
> László,
>
> while I am no "learned" or experienced member of this list and have barely
> no Stata experience, I found that Judson Caskeys version xtileJ (you can
> find it at his page at
> http://personal.anderson.ucla.edu/judson.caskey/data.html ) gets the
> quantile job done far more efficiently. (We are talking minutes vs. hours)
>
> Maybe you can use his version as starting point for something of your own.
> Comparing the quantile ranks of xtile and xtileJ my sample shows a
> correlation of .996.
>
> HTH,
> Daniel
>
>
> -------- Original-Nachricht --------
> > Datum: Fri, 7 Sep 2012 17:50:15 +0200
> > Von: Maarten Buis <[email protected]>
> > An: [email protected]
> > Betreff: Re: st: faster xtiling
>
> > On Fri, Sep 7, 2012 at 5:04 PM, László Sándor wrote:
> > > I am trying to speed up -xtile- for Stata 11 and above for all
> > > platforms (for internal use) used with tens of millions of
> > > observations.
> > >
> > > I checked the source of -xtile-, and I am not sure I understand all
> > > its purpose. Most importantly, it does sort the data (a no-no with
> > > data the size of mine), even though the crucial step of _pctile does
> > > not need presorted data.
> >
> > The sorting only happens if you asked for more than 1,001 quantiles,
> > so that suggests to me that there is some limitation in _pctile that
> > makes that necessary. If it were just laziness/sloppiness than it
> > would be extremely unlikely that the code would have been written that
> > way.
> >
> > > And while I am at it, I am also happy to hear comments about the
> > > prospects of using Mata for any of this. _pctile is built-in,
> > > optimized, tailored, tweaked, polished C code, so there is little hope
> > > that Mata might improve the crucial steps, right?
> >
> > As to the properties of -pctile, only StataCorp can say anything about
> > that, as we cannot see its content any more than you can.
> >
> > -- Maarten
> >
> > ---------------------------------
> > Maarten L. Buis
> > WZB
> > Reichpietschufer 50
> > 10785 Berlin
> > Germany
> >
> > http://www.maartenbuis.nl
> > ---------------------------------
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/