Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: new package -fastxtile- available in SSC

From	Michael Stepner <[email protected]>
To	[email protected]
Subject	Re: st: new package -fastxtile- available in SSC
Date	Mon, 7 Oct 2013 08:52:24 -0400

Thanks for letting me know, David.  I'm going to get to the bottom of
this, and release an update that corrects my claims accordingly.

At a first pass in the few minutes I have this morning, it seems to be
a numerical precision issue.  I added a -return list- after fastxtile
in your code, and then compared the reported quantile boundaries to
the discrepant observations identified by -list if xt != fxt-.  The
observation that "hops the fence" in each case you documented is
identical to one of the quantiles in eight significant digits.  The
first thing I'll check is whether this difference is being caused by
xtile/fastxtile using a float where the other uses a double.

Michael

On 7 October 2013 04:14, David Muller <[email protected]> wrote:
> Hi Michael,
>
> This looks great, and it is certainly much faster than built in
> -xtile- when operating on a lot of observations!
>
> One thing to note is that -fastxtile- does not necessarily produce
> identical results to -xtile-. This seems to occur for values that are
> essentially equal to a quantile cutpoint:
>
> **************************************
> clear
> set seed 300
> set obs 10
> gen x = rnormal()
> fastxtile fxt = x, nq(6)
> xtile xt = x, nq(6)
> assert xt == fxt
> list if xt != fxt
>
> // And a larger example
> clear
> set obs 10000000
> gen x = rnormal()
> fastxtile fxt = x, nq(6)
> xtile xt = x, nq(6)
> assert xt == fxt
> list if xt != fxt
> **************************************
>
>
> All the best,
> David
>
> On 6 October 2013 23:02, Michael Stepner <[email protected]> wrote:
>> -fastxtile- is a Stata routine to create a variable of quantile
>> categories.  It is now available in the SSC, with thanks to Kit Baum.
>>
>> fastxtile is a drop in replacement for the built-in Stata program
>> xtile. It has the same syntax and produces identical results, but the
>> process has been altered to be more computationally efficient.  The
>> difference in running time is substantial in large datasets.
>>
>> fastxtile also has a few added features.  It supports computing the
>> quantile boundaries using a random sample of the data, which further
>> increases the speed, but generates approximate quantiles due to
>> sampling error.  fastxtile can also create categories based on a
>> user-specified numlist, rather than computing the quantile boundaries
>> itself.
>>
>> For anyone currently using -xtile- with large datasets, -fastxtile- is
>> worth checking out.  It has no downside, and runs significantly
>> faster.
>>
>> If you're interested, you can install the program via -ssc install fastxtile-.
>>
>> Best regards,
>> Michael
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: new package -fastxtile- available in SSC
  - From: Michael Stepner <[email protected]>
- Re: st: new package -fastxtile- available in SSC
  - From: David Muller <[email protected]>

Prev by Date: st: RE: Problem with variables in glamm
Next by Date: RE: st: how to exclude an observation when calculating the median of a group
Previous by thread: Re: st: new package -fastxtile- available in SSC
Next by thread: st: sum of rows in stata matrix
Index(es):
- Date
- Thread