Poisson and negative binomial regressions, along with their zero-inflated versions, are models for counts, not for levels of a continuous variable. That makes me think their use for this problem is dubious. Something on the other of a Tobit might be more appropriate. David Greenberg, Sociology Department, New York University
----- Original Message -----
From: muhammed abdul khalid <[email protected]>
Date: Tuesday, February 2, 2010 3:08 pm
Subject: Re: st: What multiple regression model for extreme distributions
To: [email protected]
> Hi,
> Thank you for the replies.
>
> The data is cross sectional, and saving is simply measured based on
> respondents answer on how much saving they have ( in dollars) with the
> minimum being zero. There is no negative saving. Yes, saving is my
> dependent variable.
>
> I tried logit, zip, zinb, nbreg but their std error varies greatly.
> Still unsure to what model should be used. My objective is to predict
> the contribution of education, gender, location and ethnicity to
> saving of the household.
>
> Thank you again for kind response.
>
> Muhammed
> SciencesPo Paris.
>
>
>
>
>
>
> 2010/2/2 Austin Nichols <[email protected]>:
> > You have had a number of good suggestions already, but as Nick Cox
> > points out, the distribution of the dependent variable is not all that
> > relevant to what model you choose; it is the distribution of the
> > dependent variable conditional on explanatory variables that is
> > important. Before you estimate a two-part "hurdle" or zero-inflated
> > model, I urge you to consider that the right set of explanatory
> > variables might well capture the reason for a large number of zero
> > outcomes (e.g. using -poisson- instead of -zip- etc.). When it comes
> > to household saving (I think that is your dependent variable, not
> > independent), you also want to consider debt. It may be the case that
> > households you are coding as zeros actually have negative saving
> > during the period under study. Do you have panel data, or
> > cross-sectional data? How is saving measured?
> >
> > On Tue, Feb 2, 2010 at 10:09 AM, <[email protected]> wrote:
> >> I have a household income survey data ( 38,000 observations), and my
> >> problem is doing a multiple regression on saving ( independent var)
> to
> >> ethnicity/strata/employment
> >> etc( dependent var).
> >>
> >> The problem is this : 70% of my observation for the value of saving
> is
> >> zero. I had recode it to 1 and log them, but the distribution is still
> >> extremely skewed ( mean 0.78, std dev is 2.4 min 0 max 14). The
> >> historgam still looks like the letter L , exteremly skewed to the
> >> right with long tail. Obviously, OLS is out, and I tried Poisson(
> >> glm nbinomial) but the distribution is still not distributed normally.
> >> The data are in order i.e no missing values etc etc. It is clean.For
> >> some reason, lobit would not run.
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> >
>
>
>
> --
> Muhammed
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/