Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Choosing a family using glm

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: st: Choosing a family using glm
Date	Tue, 24 Aug 2010 16:14:06 -0500

On Aug 24, 2010, at 12:10 PM, Laurie Molina wrote:

I'm trying to fit a glm to get non negative fitted values. I amthinking to use a glm with a log link. But i am not sure aboutwich family to use. Is there any test i can perform to choosebetween the normal and gamma distribution?

Everything Nick said is correct, of course -- I'll just expand a bit.WRT the distributional family, what is most important is that thevariance function of the family (i.e., the way in which the variancechanges WRT the mean) is consistent with your data. For example, thevariance function for the Normal distribution is V(mu) = 1 (where muis E(Y) or the mean of Y), which corresponds to constant variance(i.e., this is why you look for homoscedasticity in residual plotsafter classical linear regression). In contrast, the variancefunction for the gamma distribution is V(mu) = mu^2, which means thatthe variance increases with the square of the mean (i.e., constantcoefficient of variation). The easiest (and in any caseindispensable) way to check if your variance function is plausible isto plot the standardized residuals versus the fitted values and verifythat the amount of variation appears constant; in some cases it mightbe helpful to examine a plot of the absolute residuals versus thefitted values, together with the aid of -lowess-.

My data is for the rent price of houses, so it is not count data andtherefore i think i should not use poisson.

Again, what's important is that you select a family whose variancefunction is consistent with your data. For more information, see thebook Generalized Linear Models by McCullagh and Nelder.

To my understend in a clasical linear regression the asumption ofnormality is in the distribution of the error term, but in glm theasumption defined by the family selection is on the distribution ofthe dependent variable. Isnt that a huge cost for using glm insteadof a clasical linear regression model?

You are laboring under a misunderstanding. To say that thedistribution of Y conditional on X is Normal with mean XB and variancesigma^2 is the same as saying that the distribution of the errors(i.e., Y - XB) is Normal with mean 0 and variance sigma^2. And toemphasize the GLM approach, what is most important (if you're fittinga linear regression) is that the mean is XB and the variance isconstant (i.e., that your assumptions about the first and secondmoments are correct).



-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Choosing a family using glm
  - From: Phil Schumm <[email protected]>
- Re: st: Choosing a family using glm
  - From: Laurie Molina <[email protected]>

References:
- st: Choosing a family using glm
  - From: Laurie Molina <[email protected]>

Prev by Date: st: RE: RE: repeated time values problem -svar
Next by Date: st: Using Stata's "test" post-estimation command to perform comparisons after multiple-equation estimation
Previous by thread: RE: st: RE: Choosing a family using glm
Next by thread: Re: st: Choosing a family using glm
Index(es):
- Date
- Thread