Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: RE: st: ROC/logistic regression questions
From
"Clyde Schechter" <[email protected]>
To
[email protected]
Subject
RE: RE: st: ROC/logistic regression questions
Date
Tue, 1 Mar 2011 07:50:11 -0800
Junlin Liao inquired of me: "I'm wondering if you can explain the
coefficients in -rocfit- models? How do I explain it in plain English?"
The more familiar commands relating to ROC curves, -lroc-, -roctab-, and
-roccomp- are non-parametric procedures.
-rocfit- takes a different approach. The starting point is the same:
there is an actual dichotomous outcome, call it success vs failure, and
there is an ordinal observed variable which is being used to predict that
outcome, call it predictor. -rocfit- fits a parametric model, called the
binormal model, to the data.
First, the predictor variable itself, if discrete, is assumed to arise
through the application of cutpoints to an underlying continuous latent
variable. (This is specified using the -, continuous()- option in
-rocfit-).
Second, the predictor variable or its underlying latent variable, is
assumed to have a normal distribution among those cases with a success
outcome, and a (usually different) normal distribution among the cases
with a failure outcome. Each of those normal distributions is
characterized in the usual way by a mean and a standard deviation. Let's
call them mu_s sd_s, and mu_f sd_f.
Digression: If the binormal model is actually true, and if sd_s = sd_f,
then it can be shown with a fairly simple calculation that the usual
logistic regression equation describes exactly the relationship between
the continuous (observed or latent) predictor and the probability of a
success outcome.
Third, -rocfit- estimates the parameters of those normal distributions.
But instead of providing them directly, it provides a different
characterization of the distributions which is sometimes of greater
interest. In particular, the slope in -rocfit-'s output estimates the
ratio sd_f/sd_s. And the intercept estimates the standardized difference
in means (mu_s-mu_f)/sd_s. Thus the slope characterizes the relative
dispersion of the continuous predictor among the successes and failures,
and the intercept is an effect size, strongly analogous to Cohen's d.
That's about it. Frankly, I can count on the fingers of one hand the
number of times I have used the binormal model approach to ROC curves in
my work. In clinical work and general medical epidemiology it isn't very
popular. I imagine that like other models that involve latent variables,
one would find it more widely used in psychology and psychiatry, though I
don't really know.
Hope this helps.
Clyde Schechter
Department of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA
Please note new e-mail address: [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/