Hello listers,
I wondered if I could get some opinions on two issues of a regression model build that I am stuggling with. I am involved in a study that has the aim of constructing a score that can be hand calculated by clinicians to reflect outcome froma number of predictors. My predictor variable are dichotomous, categorical and continuous in form. Many of the continous variables are showing forms that appear to be modelled best by fractional polynomials. Our original plan, however, was to spline the significant continuous variables into 5 equal width categories and put them forward to the full model in this format. My concerns are:
1. By using a splined approach we are potentially losing power and accuracy in our definition of that variable, that is then being critically evaluated in that format in the final model. This may then put all non-linear continous variables at a disadvantage relative to the linear continuous variables, and reduce their liklihood of selection to the final model even though their descriptive ability may actually be as high or higher than the linear variables. A solution that presents itself would be to put them in the full model in their fractional polynomial form, however, I am concerned that if this is done we will not be able to interpret the coefficients appropriately to allow us to determine how the manual outcome score should then be constructed, which could be done relatively easily with the splined approach. Any thoughts?
2.If a splined approach is used,when determining cut-points are there any thoughts on whether it is best to use equal width categories, equal percentile categories, equal risk categories, or data driven visual analysis of the curves to determine significant cut points?
Thanks, Galina
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/