Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: approximate quantiles in Stata

From	László Sándor <[email protected]>
To	[email protected]
Subject	Re: st: approximate quantiles in Stata
Date	Sat, 24 Aug 2013 12:03:39 -0400

Also, I received a quick comment from theoretical CS people that the
quantile thing could be faster than the quoted speed of sorting in
Stata (and xtile does sort): "It is possible to find the quantiles in
O(N) using a straightforward modification of the selection algorithm
based on quicksort (see
http://en.wikipedia.org/wiki/Selection_algorithm)."
http://cstheory.stackexchange.com/questions/18732/algorithms-to-split-data-into-roughly-equal-sized-quantiles/

Is there anything to know about QuickSort in Stata?

That said, I cannot write up a fancy algorithm to find the quantiles,
let alone a whole new -sort-, so probably the question is still about
sensible downsampling.

On Sat, Aug 24, 2013 at 7:42 AM, László Sándor <[email protected]> wrote:
> Thanks, David,
>
> I think the typical use case is about tens of millions of
> observations. (And as I think it matters for precision, the typical
> case is about 20 bins, or vingtiles.)
>
> FWIW, I also tried to profile -xtile- with maximum number of
> observations possible. With Stata 13 running on 64 cores, it took 7
> hours to generate vingtiles.
>
> Laszlo
>
> On Fri, Aug 23, 2013 at 9:54 PM, David Hoaglin <[email protected]> wrote:
>> Hi, Laszlo.
>>
>> How large are your samples, and which quantiles do you need?
>>
>> I think I saw some relevant work a number of years ago, and I will
>> have to look for it.
>>
>> David Hoaglin
>>
>> On Fri, Aug 23, 2013 at 9:01 PM, László Sándor <[email protected]> wrote:
>>> Hi,
>>> My work is slowed down by the precise but computationally intensive
>>> quantile calculation of Stata. I am curious if there are any
>>> approximation algorithms implemented out there, something along these
>>> ideas: http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/
>>>
>>> So this is not about estimating population quantiles from a small
>>> sample (see Nick's hdquantile on SSC, e.g.). This is about finding
>>> approximate quantiles in large data.
>>>
>>> If the answer is simply random downsampling before taking quantiles, I
>>> would still appreciate some guidance on how heavily to downsample as a
>>> function of population size.
>>>
>>> Thanks!
>>>
>>> Laszlo
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: approximate quantiles in Stata
  - From: Joe Canner <[email protected]>

References:
- st: approximate quantiles in Stata
  - From: László Sándor <[email protected]>
- Re: st: approximate quantiles in Stata
  - From: David Hoaglin <[email protected]>
- Re: st: approximate quantiles in Stata
  - From: László Sándor <[email protected]>

Prev by Date: Re: st: RE: Questions for Subsample
Next by Date: Re: st: How to test to heteroskedasticity after -vce(cluster clusvar)- command?
Previous by thread: Re: st: approximate quantiles in Stata
Next by thread: RE: st: approximate quantiles in Stata
Index(es):
- Date
- Thread