Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | László Sándor <sandorl@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: approximate quantiles in Stata |
Date | Sat, 24 Aug 2013 07:42:16 -0400 |
Thanks, David, I think the typical use case is about tens of millions of observations. (And as I think it matters for precision, the typical case is about 20 bins, or vingtiles.) FWIW, I also tried to profile -xtile- with maximum number of observations possible. With Stata 13 running on 64 cores, it took 7 hours to generate vingtiles. Laszlo On Fri, Aug 23, 2013 at 9:54 PM, David Hoaglin <dchoaglin@gmail.com> wrote: > Hi, Laszlo. > > How large are your samples, and which quantiles do you need? > > I think I saw some relevant work a number of years ago, and I will > have to look for it. > > David Hoaglin > > On Fri, Aug 23, 2013 at 9:01 PM, László Sándor <sandorl@gmail.com> wrote: >> Hi, >> My work is slowed down by the precise but computationally intensive >> quantile calculation of Stata. I am curious if there are any >> approximation algorithms implemented out there, something along these >> ideas: http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/ >> >> So this is not about estimating population quantiles from a small >> sample (see Nick's hdquantile on SSC, e.g.). This is about finding >> approximate quantiles in large data. >> >> If the answer is simply random downsampling before taking quantiles, I >> would still appreciate some guidance on how heavily to downsample as a >> function of population size. >> >> Thanks! >> >> Laszlo > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/