Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Choosing a family using glm
From
Phil Schumm <[email protected]>
To
[email protected]
Subject
Re: st: Choosing a family using glm
Date
Tue, 24 Aug 2010 16:14:06 -0500
On Aug 24, 2010, at 12:10 PM, Laurie Molina wrote:
I'm trying to fit a glm to get non negative fitted values. I am
thinking to use a glm with a log link. But i am not sure about
wich family to use. Is there any test i can perform to choose
between the normal and gamma distribution?
Everything Nick said is correct, of course -- I'll just expand a bit.
WRT the distributional family, what is most important is that the
variance function of the family (i.e., the way in which the variance
changes WRT the mean) is consistent with your data. For example, the
variance function for the Normal distribution is V(mu) = 1 (where mu
is E(Y) or the mean of Y), which corresponds to constant variance
(i.e., this is why you look for homoscedasticity in residual plots
after classical linear regression). In contrast, the variance
function for the gamma distribution is V(mu) = mu^2, which means that
the variance increases with the square of the mean (i.e., constant
coefficient of variation). The easiest (and in any case
indispensable) way to check if your variance function is plausible is
to plot the standardized residuals versus the fitted values and verify
that the amount of variation appears constant; in some cases it might
be helpful to examine a plot of the absolute residuals versus the
fitted values, together with the aid of -lowess-.
My data is for the rent price of houses, so it is not count data and
therefore i think i should not use poisson.
Again, what's important is that you select a family whose variance
function is consistent with your data. For more information, see the
book Generalized Linear Models by McCullagh and Nelder.
To my understend in a clasical linear regression the asumption of
normality is in the distribution of the error term, but in glm the
asumption defined by the family selection is on the distribution of
the dependent variable. Isnt that a huge cost for using glm instead
of a clasical linear regression model?
You are laboring under a misunderstanding. To say that the
distribution of Y conditional on X is Normal with mean XB and variance
sigma^2 is the same as saying that the distribution of the errors
(i.e., Y - XB) is Normal with mean 0 and variance sigma^2. And to
emphasize the GLM approach, what is most important (if you're fitting
a linear regression) is that the mean is XB and the variance is
constant (i.e., that your assumptions about the first and second
moments are correct).
-- Phil
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/