Dear Laurel, Nick, Roger, Joao Pedro, Giogio and Todd,
Many thanks for your comments. To put things in perspective, the presenter
was studying new maize varieties and sought to identify some socio economic
factors that may explain the adoption of these varieties. All respondents in
her sample grew some maize (traditional, improved or both) so her dependent
variable was area under improved varieties (which would then be handled
easily in a censored regression framework or better still as a corner
solution outcome). However, she argued that the area allocated needed to be
adjusted for total area under maize (if one has 1 acre and allocated 0.5
acres to the maize then in terms of adoption, this should not be the same as
someone with 10 acres  of maize land but also allocates 0.5 acres) hence the
dependent variable was total area under new maize/ total maize area (hence
the proportion).
From Laurels email, it would imply that all the independent variables should
also be divided by the maize area, while Nicks email points out (correctly)
that while the dependent variable lies between 0 and 1, using OLS does not
guarantee that the predicted values of y will lie between 0 and 1 (which is
one of the main arguments against the Linear Probability Model). Roger
points to a binary dependent variable however the dependent variable here is
not quite binary. Joao Pedro suggests something that the presenter actually
did, while I still need to think thru Giorgios suggestion and I am just
going to read thru the paper suggested by Todd
In the light of the "added flesh" to the problem, I would appreciate your
comments on the best way to proceed (for example, would just including the
total maize area as one of the independent variables be a sufficient
control)
If the Y-variable is a proportion rather than a binary variable, then you 
can still use either -regress- with Huber variances, or -glm- with identity 
link and binomial family, or even -glm- with log link and binomial family 
if you want multiplicative effects. The -glm- command will warn you that 
your Y-variable is not binary, but will still do as it is asked. The main 
problem with homoskedastic (equal-variance) linear regression is that, if 
the Y-variable is a proportion, then the conditional variance is not likely 
to be independent of the conditional mean, because proportions sampled from 
a distribution with a mean near 0.5 can vary more than proportions sampled 
from a distribution with a mean near 0 or 1. The -family- option of -glm- 
simply optimises the estimation under a particular assumption about 
mean-variance relationship, in order to minimize the width of the 
confidence intervals if that assumption is true. If you also use the 
-robust- option, then your standard errors will still be consistent, even 
if you do not guess the mean-variance relationship right first time. I 
myself would probably not simply use area under new maize as the Y-variable 
and area under total maize as an X-variable, because I would expect the 
effect of total maize area on area under new maize to be multiplicative 
rather than additive.