Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | László Sándor <sandorl@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: approximate quantiles in Stata |
Date | Sun, 25 Aug 2013 18:39:26 -0400 |
Thanks again, David. Actually, I convinced myself that this can be a simpler problem than it seemed. See my tentative answer on CV: http://stats.stackexchange.com/a/68308/6534 using http://www.math.mcgill.ca/~dstephens/OldCourses/556-2006/Math556-Median.pdf I like to binned scatter plots for clarity, with equal-sized bins. The binning was a huge bottleneck in large samples. On Sun, Aug 25, 2013 at 4:54 PM, David Hoaglin <dchoaglin@gmail.com> wrote: > Laszlo, > > You're welcome. > > The comments about the quality of the sample seem rather vague. I > didn't dig for a specific measure of "quality." Working with an > incoming stream of data makes the problem more challenging. You're > fortunate to have the entire "population" already. > > If the estimate of a quantile is to have a specified variance, the > necessary sample size will be larger for more-extreme quantiles. > > You have not explained what you plan to do with your 20 bins. A > sample of suitable size would give you estimates of the boundaries of > the bins (i.e., the 19 quantiles). Then a single pass over the > population would give you the exact number of data values in each bin. > > David Hoaglin > > On Sun, Aug 25, 2013 at 11:26 AM, László Sándor <sandorl@gmail.com> wrote: >> Thanks, David. >> >> I think I found a reference about quantiles from downsampling, only >> with a little clarification needed. I think I see the point about why >> the size of the sample matters and not the sampling rate. See the >> discussion in this CStheory answer: >> http://cstheory.stackexchange.com/a/18734/17375 >> >> Or staying closer to our stats brethren, I edited my question on Cross >> Validate with my current concerns: >> http://stats.stackexchange.com/questions/68208/how-should-sampling-ratios-to-estimate-quantiles-change-with-population-size >> >> On the point about the tails: I think equal-sized bins provide a good >> summary of a distribution, or binned scatter plots of correlations. >> Unequal bins can confuse people just as much about about the unequal >> precision of means of the bins (true, it is not only the bin size that >> drives it, the tails will have larger standard deviations). >> >> But thanks! > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/