I wouldn't worry about reinventing the wheel. The wheel
was a very good idea. My own discipline, whatever it is,
is plagued with papers that propose square or other
non-circular wheels and then bask in the supposed
originality or creativity of the proposals.
I can't comment on whatever is your field. But
any measure that is a difference of probabilities
(proportions), as yours is, has clear anchor points
at +1, 0 and -1 and is attractive on that and other
grounds.
Other examples include some flavours of
rank correlations and related measures (including
the always popular Somers' d) and, on a more pedestrian
level, those discussed at
http://www.stata.com/support/faqs/data/measures.html
On a different level, it is neither necessary nor
efficient to create a new variable just to hold
a total, as in
. egen number_up_outcome=total(p_new>p_old & outcome)
. egen number_down_outcome=total(p_new<p_old & outcome)
. egen number_up_no_outcome=total(p_new>p_old & !outcome)
. egen number_down_no_outcome=total(p_new<p_old & !outcome)
Using -count- consistently will save you from a bundle
of such variables, all holding constants. For bullet-proof
code, you would need to trap missings (which count
as arbitrarily large) and ensure that the two estimation
samples here were identical. Note that missing values
of -outcome- would count as true.
logistic outcome zlog zero
gen byte used = e(sample)
predict p_old
logistic outcome zlog zero new_marker
// script will fail if two samples differ
assert used == e(sample)
predict p_new
count if used
local N = r(N)
count if p_new > p_old & used & outcome == 1
local a = r(N)
count if p_new < p_old & used & outcome == 1
local b = r(N)
...
Nick
[email protected]
Daniel Waxman
> I am studying the effect of adding a biomarker to an existing
> model and want
> to describe the effect of that model vis-�-vis the number of
> subjects with
> improved predictions in the "new model" vs. the "old model".
> While there is
> an extensive literature on this topic, most of it divides the
> outcome into
> risk categories (i.e. predicted risk of 0-5%, 5-10%, etc.),
> something that I
> am not so interested in doing.
>
> An intuitive way to look at this would be to look at the net number of
> subjects who are assigned a higher predicted probability with
> the new model
> among those with the outcome in question, plus the net number
> assigned a
> lower probability among those who did not have the outcome.
> The ratio of
> this number to the total # of subjects would then be the proportion of
> patients with improved predictions (and would range from zero
> to 1). See
> example below.
>
> My question: Did I just reinvent the wheel? (e.g. is this
> equivalent to
> some existing statistic?) Does anybody see any logical
> problem with looking
> at this as one measure of the effect of adding a predictor to
> an existing
> model?
>
> Thanks,
> Daniel Waxman
>
> **** example: (where zlog is continuous, zero is dichotomous,
> new_marker is
> the dichotomous new marker, and there is no missing data) ***
>
>
> . logistic outcome zlog zero
> . predict p_old
>
> . logistic outcome zlog zero new_marker
> . predict p_new
>
> . count if e(sample)
> . gen N=r(N)
>
> . egen number_up_outcome=total(p_new>p_old & outcome)
> . egen number_down_outcome=total(p_new<p_old & outcome)
>
> . egen number_up_no_outcome=total(p_new>p_old & !outcome)
> . egen number_down_no_outcome=total(p_new<p_old & !outcome)
>
> . gen net_proportion_improved=
> ((number_up_outcome-number_down_outcome)+(number_down_no_outco
> me-number_up_n
> o_outcome))/N
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/