Ronnie Babigumira
> I was attending a workshop in which one of the presenters
> had a regression
> in which a dependent variable was a proportion. One of the
> participants
> noted that it was wrong but didnt follow it up with a clear
> explanation.
Presumably the argument was that, given predictor x,
a linear form a + bx must predict response values outside [0,1] for
some x, so that at least in principle the functional
form cannot be appropriate. In practice, if response were (say)
proportion female and x were time, then the time at which the
proportion passed outside the interval might be far outside the
range of the data, but there are plenty of exceptions.
This is most commonly mentioned, at least in my reading,
as a simple argument why a + bx is likely to be a poor form
for predicting responses which are either 0 or 1, an
argument which usually leads to a case for logit or
probit models. But the argument seems almost as strong
for proportions. And -- historically -- logit as a
transformation for continuous responses preceded logit
as (in modern terms) a link function for binary responses.
(The terminology of logit is more recent than its use.)
Generalised linear models offer a nice approach to this
question using e.g. logit link and some sensible family.
There is a FAQ with further comments at
How does one estimate a model when the dependent variable
is a proportion?
http://www.stata.com/support/faqs/stat/logit.html
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/