Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: approximate quantiles in Stata


From   László Sándor <[email protected]>
To   [email protected]
Subject   Re: st: approximate quantiles in Stata
Date   Sun, 25 Aug 2013 18:39:26 -0400

Thanks again, David.

Actually, I convinced myself that this can be a simpler problem than
it seemed. See my tentative answer on CV:
http://stats.stackexchange.com/a/68308/6534
using
http://www.math.mcgill.ca/~dstephens/OldCourses/556-2006/Math556-Median.pdf

I like to binned scatter plots for clarity, with equal-sized bins. The
binning was a huge bottleneck in large samples.

On Sun, Aug 25, 2013 at 4:54 PM, David Hoaglin <[email protected]> wrote:
> Laszlo,
>
> You're welcome.
>
> The comments about the quality of the sample seem rather vague.  I
> didn't dig for a specific measure of "quality."  Working with an
> incoming stream of data makes the problem more challenging.  You're
> fortunate to have the entire "population" already.
>
> If the estimate of a quantile is to have a specified variance, the
> necessary sample size will be larger for more-extreme quantiles.
>
> You have not explained what you plan to do with your 20 bins.  A
> sample of suitable size would give you estimates of the boundaries of
> the bins (i.e., the 19 quantiles).  Then a single pass over the
> population would give you the exact number of data values in each bin.
>
> David Hoaglin
>
> On Sun, Aug 25, 2013 at 11:26 AM, László Sándor <[email protected]> wrote:
>> Thanks, David.
>>
>> I think I found a reference about quantiles from downsampling, only
>> with a little clarification needed. I think I see the point about why
>> the size of the sample matters and not the sampling rate. See the
>> discussion in this CStheory answer:
>> http://cstheory.stackexchange.com/a/18734/17375
>>
>> Or staying closer to our stats brethren, I edited my question on Cross
>> Validate with my current concerns:
>> http://stats.stackexchange.com/questions/68208/how-should-sampling-ratios-to-estimate-quantiles-change-with-population-size
>>
>> On the point about the tails: I think equal-sized bins provide a good
>> summary of a distribution, or binned scatter plots of correlations.
>> Unequal bins can confuse people just as much about about the unequal
>> precision of means of the bins (true, it is not only the bin size that
>> drives it, the tails will have larger standard deviations).
>>
>> But thanks!
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index