Austin Nichols has already picked up the main
point, about logs and logits.
I have two further comments.
1. I find it helpful to keep straight
the distinction between _transformations_
and _link functions_, the latter
jargon particular to generalised linear models
literature but not exclusive to it. For
example, a classic logit model with binary
outcomes 0 and 1 does not transform the
response, nor could it do so, as logit 0 and logit 1 are both
indeterminate. Rather the point is that
the mean response is modelled to vary between 0 and
1 (but can assume neither of those limits).
More generally, in -glm- the link function
is not strictly a transformation.
2. I don't have any general recipe for responses
on (0,1), [0,1], (0,1] or [0,1) any more than
I do for responses on any other supports (apart
from always plot your data!). As
usual there is a range of models with varying
assumptions and some experience and some prejudices
about how they work under departures from the
optimum.
As a joint author of -betafit- I have some
affection for beta models, but affection gets
you nowhere in this field. It is explicit that
-betafit- ignores exact 0s and 1s and so it
should be obvious that it is quite inappropriate
whenever they occur. Other procedures do appear
to work better in those circumstances.
The trickiest circumstance appears to be whenever
there is a spike at 0 or 1 or both in the distribution.
For example, if the response is fraction of assets
held in savings accounts, then presumably lots of
people have no savings accounts and will score exact zeros
if they are part of the sample. This comes up repeatedly
on the list. It is fascinating to observe the range of
attitudes, including those who appear to assume that there
must a transformation that will somehow fix this, those who say,
"Just leave them out", and those who are convinced
that the answer must be a two-part model. Either way,
seeking a panacea is not a good idea. The science
of what you are doing has to have the first call.
Nick
[email protected]
Clive Nicholas
> Nick Cox wrote:
>
> > Some confusion here between logarithms and logits?
>
> If I'm thinking straight, you're arguing that the call to -glm,
> link(logit)- only makes sense for models whose dependent variables are
> already scaled 0-1, since the -link()- option does the transformation.
> That's certainly what you suggested to me here:
>
> http://www.stata.com/statalist/archive/2007-01/msg00315.html
>
> I feel a Statalist sequel coming on. I've just finished re-fitting a
> batch of fractional logit models to voting intention data after
> discovering that I had log-transformed the dependent variables when it
> wasn't necessary, largely due to the re-reading of the above post!
>
> If this is so, which is the most appropriate Stata routine with which
> to fit an LT-OLS regression model? Note that not everybody in my field
> thinks this to be a good idea, anyway; indeed Paolino's (2001)
> extensive Monte Carlo tests found that such models come off third best
> against pure OLS and beta-distributed models in terms of bias,
> efficiency and 'overconfidence', and across a range of distributions
> to boot. It was this paper that encouraged me to move away from such
> an approach.
>
> Paolino P (2001) "Maximum Likelihood Estimation of Models with
> Beta-Distributed Dependent Variables", Political Analysis 9(4):
> 325-46.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/