|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: What multiple regression model for extreme distributions
From |
Michael Norman Mitchell <[email protected]> |
To |
[email protected] |
Subject |
Re: st: What multiple regression model for extreme distributions |
Date |
Tue, 02 Feb 2010 12:59:27 -0800 |
Dear Muhammed
I think that it is possible that this is more of a question of theory
than a statistical question. The great answers that have been posted
reflect, as I see it, different theoretical assumptions about the nature
of the outcome and how the predictors are related to the outcome (of
saving). I think that these different statistical suggestions each could
be valid under different theoretical frameworks. Perhaps returning to
the literature on the nature of "saving" to get a theoretical basis
would help to inform the statistical model that should be selected. It
also could be an opportunity to see what statistical models have been
accepted in publications in the past.
I know this is more work, but if the aim is publication, it may be
worthwhile.
Best regards,
Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell
muhammed abdul khalid wrote:
Hi,
Thank you for the replies.
The data is cross sectional, and saving is simply measured based on
respondents answer on how much saving they have ( in dollars) with the
minimum being zero. There is no negative saving. Yes, saving is my
dependent variable.
I tried logit, zip, zinb, nbreg but their std error varies greatly.
Still unsure to what model should be used. My objective is to predict
the contribution of education, gender, location and ethnicity to
saving of the household.
Thank you again for kind response.
Muhammed
SciencesPo Paris.
2010/2/2 Austin Nichols <[email protected]>:
You have had a number of good suggestions already, but as Nick Cox
points out, the distribution of the dependent variable is not all that
relevant to what model you choose; it is the distribution of the
dependent variable conditional on explanatory variables that is
important. Before you estimate a two-part "hurdle" or zero-inflated
model, I urge you to consider that the right set of explanatory
variables might well capture the reason for a large number of zero
outcomes (e.g. using -poisson- instead of -zip- etc.). When it comes
to household saving (I think that is your dependent variable, not
independent), you also want to consider debt. It may be the case that
households you are coding as zeros actually have negative saving
during the period under study. Do you have panel data, or
cross-sectional data? How is saving measured?
On Tue, Feb 2, 2010 at 10:09 AM, <[email protected]> wrote:
I have a household income survey data ( 38,000 observations), and my
problem is doing a multiple regression on saving ( independent var) to
ethnicity/strata/employment
etc( dependent var).
The problem is this : 70% of my observation for the value of saving is
zero. I had recode it to 1 and log them, but the distribution is still
extremely skewed ( mean 0.78, std dev is 2.4 min 0 max 14). The
historgam still looks like the letter L , exteremly skewed to the
right with long tail. Obviously, OLS is out, and I tried Poisson(
glm nbinomial) but the distribution is still not distributed normally.
The data are in order i.e no missing values etc etc. It is clean.For
some reason, lobit would not run.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/