I'm doing a logistic regression using a non-negative, continuous independent variable X, for which about 60% of cases have X=0. It seems to me that just including X in the model is problematic, since it is likely that many cases with Y=0 and many others with Y=1 will have X=0. I can think of 2 possible approaches to modeling X, but would like some feedback on them, and any other thoughts on how to handle this situation.
a) Divide X into m categories and represent it with m-1 dummy variables in the model.
b) Include X in the model, and also include a binary variable Z such that Z=1 when X=0 and Z=0 otherwise. Then the effect of X=0 is given by the coefficient of Z, and the effect of X>0 is purely given by the
coefficient of X itself (since then Z=0).
Allan
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/