Thanks to all who have helped out with my problem of having a large dataset
in which the primary predictor variable (a laboratory test) is censored at
zero. I think that I have come across a good solution, but have a couple of
questions.
Mfp (multivariable fractional polynomials) has a 'catzero' option which
seems to take zero out of the picture and replace it with a binary variable.
When I use it (with logit and the untransformed primary variable) on my
primary dataset, everything works exceedingly well, in terms of fit, and its
pick of the simple log-transformation of the primary predictor.
Two questions: (1) I had divided my dataset into derivation and validation
sets. Does anybody know of an efficient way to run the model on one set and
then get the correct coefficients (I*__0, I*__1) over to a different dataset
(without re-running the regression, of course)? This works with fracpoly and
fracpredict, but doesn't seem to with 'mfp'.
(2) I have a completely different dataset in which I would expect (due to a
different lab test with different performance characteristics) to have
slightly different coefficients. For that one, I need to force it to use
the log transformation with power(0), although the fit is fine (seems to be
better than what it chooses). Is there anything objectionable about doing
this?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/