My take differs from anybody else! From what you say, this is not a
spike. It is just strong skewness.
A spike in my book is a big group of identical values, in this context
usually lots of exact zeros or exact ones (or 100%s, naturally).
A good approximation is if that you take logits of a beta-distributed
variable, the distribution looks bell-shaped. That's true even for
highly skewed betas with modes near 0 or near 1. Here, as in many other
places, the logit works wonders. So, your proportion data are fit for a
beta model to the extent that their logits look bell-shaped. Of course,
you might end up fitting a mediocre model if you can't think of or fit a
better one.
However, if you have any exact zeros or ones, you can't take logits, and
equivalently you can't really fit a beta. You need either a fudge that
denies that the zeros or ones really are that or a mixture model such as
others are referring to.
Nick [not Nic]
[email protected]
Dan Weitzenfeld
I am trying to determine which testing factors drive a proportion
dependent variable, PercentNoise.
In searching the archives, I came across -betafit-, and a link to the
FAQ: "How do you fit a model when the dependent variable is a
proportion?" In that response, Allen McDowell and Nic Cox write, "In
practice, it is often helpful to look at the frequency distribution: a
marked spike at zero or one may well raise doubt about a single model
fitted to all data."
That describes my situation exactly: I have a marked spike in my
histogram at the top bin, roughly .95 - 1. I am wondering how to
account for this.
Does -betafit- take such a possibility into account?
Can someone briefly describe how I could use multiple models to fit
all the data, as implied in the FAQ response?
My fallback is setting a pass/fail bar and converting my proportions
to a binary, then using probit/logit. But the obvious drawback is
that I am throwing away information by collapsing the continuous
(albeit bounded) proportion variable to a binary.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/