Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: approximate quantiles in Stata
From
David Hoaglin <[email protected]>
To
[email protected]
Subject
Re: st: approximate quantiles in Stata
Date
Sun, 25 Aug 2013 16:54:08 -0400
Laszlo,
You're welcome.
The comments about the quality of the sample seem rather vague. I
didn't dig for a specific measure of "quality." Working with an
incoming stream of data makes the problem more challenging. You're
fortunate to have the entire "population" already.
If the estimate of a quantile is to have a specified variance, the
necessary sample size will be larger for more-extreme quantiles.
You have not explained what you plan to do with your 20 bins. A
sample of suitable size would give you estimates of the boundaries of
the bins (i.e., the 19 quantiles). Then a single pass over the
population would give you the exact number of data values in each bin.
David Hoaglin
On Sun, Aug 25, 2013 at 11:26 AM, László Sándor <[email protected]> wrote:
> Thanks, David.
>
> I think I found a reference about quantiles from downsampling, only
> with a little clarification needed. I think I see the point about why
> the size of the sample matters and not the sampling rate. See the
> discussion in this CStheory answer:
> http://cstheory.stackexchange.com/a/18734/17375
>
> Or staying closer to our stats brethren, I edited my question on Cross
> Validate with my current concerns:
> http://stats.stackexchange.com/questions/68208/how-should-sampling-ratios-to-estimate-quantiles-change-with-population-size
>
> On the point about the tails: I think equal-sized bins provide a good
> summary of a distribution, or binned scatter plots of correlations.
> Unequal bins can confuse people just as much about about the unequal
> precision of means of the bins (true, it is not only the bin size that
> drives it, the tails will have larger standard deviations).
>
> But thanks!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/