Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: approximate quantiles in Stata
From
László Sándor <[email protected]>
To
[email protected]
Subject
Re: st: approximate quantiles in Stata
Date
Sat, 24 Aug 2013 12:03:39 -0400
Also, I received a quick comment from theoretical CS people that the
quantile thing could be faster than the quoted speed of sorting in
Stata (and xtile does sort): "It is possible to find the quantiles in
O(N) using a straightforward modification of the selection algorithm
based on quicksort (see
http://en.wikipedia.org/wiki/Selection_algorithm)."
http://cstheory.stackexchange.com/questions/18732/algorithms-to-split-data-into-roughly-equal-sized-quantiles/
Is there anything to know about QuickSort in Stata?
That said, I cannot write up a fancy algorithm to find the quantiles,
let alone a whole new -sort-, so probably the question is still about
sensible downsampling.
On Sat, Aug 24, 2013 at 7:42 AM, László Sándor <[email protected]> wrote:
> Thanks, David,
>
> I think the typical use case is about tens of millions of
> observations. (And as I think it matters for precision, the typical
> case is about 20 bins, or vingtiles.)
>
> FWIW, I also tried to profile -xtile- with maximum number of
> observations possible. With Stata 13 running on 64 cores, it took 7
> hours to generate vingtiles.
>
> Laszlo
>
> On Fri, Aug 23, 2013 at 9:54 PM, David Hoaglin <[email protected]> wrote:
>> Hi, Laszlo.
>>
>> How large are your samples, and which quantiles do you need?
>>
>> I think I saw some relevant work a number of years ago, and I will
>> have to look for it.
>>
>> David Hoaglin
>>
>> On Fri, Aug 23, 2013 at 9:01 PM, László Sándor <[email protected]> wrote:
>>> Hi,
>>> My work is slowed down by the precise but computationally intensive
>>> quantile calculation of Stata. I am curious if there are any
>>> approximation algorithms implemented out there, something along these
>>> ideas: http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/
>>>
>>> So this is not about estimating population quantiles from a small
>>> sample (see Nick's hdquantile on SSC, e.g.). This is about finding
>>> approximate quantiles in large data.
>>>
>>> If the answer is simply random downsampling before taking quantiles, I
>>> would still appreciate some guidance on how heavily to downsample as a
>>> function of population size.
>>>
>>> Thanks!
>>>
>>> Laszlo
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/