I am working on a problem that involves multivariable modeling of:
Y=represents a time delay that is not only right-skewed but also has a
fairly large probability mass at 0 (i.e. 13% of subjects have Y=0).
In particular, I'm interested in the independent varibles associated
with unusually long values of Y. So, I decided to create a QR model of
the 90th conditional percentile of Y. I did not use a logistic
regression approach (after dichotomizing Y at some arbitrary
unconditional cutpoint that represents a "long" delay) because of the
known problems with that approach (MacCallum R, Zhang S, Preacher K,
Rucker D. On the practice of dichotomization of quantitative variables.
Psychological Methods 2002;7(1):19-40).
Here are 2 of the reviewer's comments for this paper:
1. The real virtue of quantile regression, as argued by its author, is
to explore covariate effect by estimating an entire family of
conditional quantile functions, albeit this has an implicit ordinal
aspect [R. Koenker and K. F. Hallock. Quantile regression. Journal of
Economic Perspectives 15 (4):143-156, 2001]. There may be heuristic
value in using a selective quantile regression, but this would seem to
reproduce the problem of logistic regression at a different level.
Moreover, quantile regression would presumably share the difficulty of
linear regression in explicitly modelling covariate effect at zero
probability. Such is not the case with zero-adjusted estimators within
the GLM family, as below.
2. The authors could consider (i) a count-data approach [Y could be
expressed in integer hours; fractional hours may be subject to
measurement error] and the various zero-inflated count estimators
available in Stata or (ii) for a continuous data approach, modelling via
zero-adjusted estimators within generalized linear models (GLM), using,
say, the inverse-Gaussian or gamma distribution both of which have
found utility in modelling skewed distributions [P. de Jong and G. Z.
Heller. Generalized Linear Models for Insurance Data, Cambridge,
UK:Cambridge University Press, 2008].
My question is: Is he correct? Specifically, I am uncertain about the
validity of the criticisms of using QR that he raises in #1. I don't
dispute (as he indicates in #2) that alternative statistical approaches
are available for this question, but I still believe that a model of the
90th percentile is legitimate approach to this question for this data
set and variable distribution.
Thus, I'd appreciate anyone's thoughts on: (a) the reviewer's criticisms
of using QR for this purpose and in this manner, and (b) the "best"
approach to this multivariable modeling problem.
Thanks,
Allan
------------------------------------------------------------------------
--
Allan Garland, MD, MA
Associate Professor of Medicine &
Community Health Sciences
University of Manitoba
Health Sciences Center - GF 222
820 Sherbrook Street
Winnipeg, Manitoba R3A 1R9
phone: 204-787-1198
page: 204-935-2166
fax: 204-787-1087
email: [email protected]
This email and/or any documents in this transmission is intended for the
addressee(s) only and may contain legally privileged or confidential information. Any unauthorized use, disclosure, distribution, copying or dissemination is strictly prohibited. If you receive this transmission in error, please notify the sender immediately and return the original.
Ce courriel et tout document dans cette transmission est destiné à la personne ou aux personnes à qui il est adressé. Il peut contenir des informations privilégiées ou confidentielles. Toute utilisation, divulgation, distribution, copie, ou diffusion non autorisée est strictement défendue. Si vous n'êtes pas le destinataire de ce message, veuillez en informer l'expéditeur immédiatement et lui remettre l'original.