Nick et al.--
The sample median might not necessarily be zero once sample weights
are taken account of--for example if zeros tend to have very low
relative weight and nonzero cases have relatively high weights--since
we are not given weights, we cannot be sure. Depending on the
weights, the data might look a lot less or a lot more skewed than the
unweighted tab seems to imply! But examples or simulations (to
explore coverage and small-sample bias) should include weights and
clusters if possible, whether estimating the overall mean or the
proportion nonzero and mean or median of nonzero cases, as in
http://www.stata.com/statalist/archive/2009-11/msg01354.html
On Thu, Nov 26, 2009 at 5:34 AM, Nick Cox <[email protected]> wrote:
> Jay makes an interesting point, although in turn it can be restated to
> acknowledge that the central limit theorem comes in numerous different
> flavours depending on quite what assumptions are being made. (For
> example, there are flavours allowing various kinds of dependence.)
> Alternatively, purists might want to talk of a family of central limit
> theorems.
>
> However, my guess is that this is not the central issue. (That pun was
> unintentional in my first draft and deliberate in my second.) Although
> with lots of zeros and strong skew the distribution concerned is awkward
> practically, I'd be surprised if it was pathological mathematically, or
> indicative of an underlying distribution that was. The point could be
> explored a little by e.g. bootstrapping.
>
> The median in the sample data was clearly zero!
>
> Nick
> [email protected]
>
> Verkuilen, Jay
>
> Kieran McCaul wrote:
>
>>The skew in the data does not stop you from calculating the mean, nor
> does it stop you from calculating a 95% CI around the mean.
> Regardless of the skew in the data, the sampling distribution of the
> mean will be Normal.<
>
> Not true. It will tend towards normality (in the sense of convergence in
> distribution) assuming regularity conditions for the central limit
> theorem hold, which for highly skewed variables is often NOT the case.
> But that convergence may be VERY slow and the resulting confidence
> interval for the mean may be extremely poor (incredibly wide) or even
> ludicrous (e.g., below the lower bound of the data).
>
> I would wonder whether the original poster might want to estimate a
> median instead of a mean?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/