Ronnie Babigumira asked whether linear regression was appropriate for a
proportion. Many wrote back to point out that proportions involved
binary data and linear regression is for continuous outcomes. Ronnie
then clarified that the proportion was a single value between 0 and 1
for each observation, in this case, the percentage of field space
allocated to new variety maize for each farmer.
My tuppence, with an open call for comment, is that many areas in
medical research and psychometrics have similar properties to the
problem Ronnie raises. For example, pain is often measured on a 0 - 100
scale; quality of life scales such as the SF36 convert various numerical
scores into a proportion of the maximum score to give a quality of life
between 0 and 100. Biostatisticians have used linear regression for many
years without worrying too much about it, unless there was a particular
reason: as Nick Cox put it, it all depends on the data and the use to
which it is being put. If the dependent variable is normally distributed
with a mean of 0.5 and an SD of 0.1, linear regression is probably going
to work fine. If the dependent variable has many 0's and / or 1's, as
might well be the case with the maize data, you might have a problem,
particular that you regression will make out of sample predictions. My
guess is that with the maize data, differences between say, 55% and 65%
aren't neither important nor likely as farmers will plant certain whole
areas with a particular crop. Thus you could categorize the data into
quartiles (0-24.9%, 25%-49.9%, 50% - 74.9%, 75%- 100%) and then do an
ordinal regression.
Andrew Vickers
Memorial Sloan-Kettering Cancer Center
=====================================================================
Please note that this e-mail and any files transmitted with it may be
privileged, confidential, and protected from disclosure under
applicable law. If the reader of this message is not the intended
recipient, or an employee or agent responsible for delivering this
message to the intended recipient, you are hereby notified that any
reading, dissemination, distribution, copying, or other use of this
communication or any of its attachments is strictly prohibited. If
you have received this communication in error, please notify the
sender immediately by replying to this message and deleting this
message, any attachments, and all copies and backups from your
computer.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/