Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Inconsistent results with rocfit
From
Ronan Conroy <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Inconsistent results with rocfit
Date
Tue, 2 Mar 2010 11:09:59 +0000
On 25 Feabh 2010, at 18:30, Paul Seed wrote:
Dear Statalist,
An odd problem has come up.
I have two versions on the same predictor
(as measured & logged) , and one binary outcome.
When I use -roctab-, I get identical estimates of the ROC area.
when I use -rocfit-, I do not.
The problem is reproducible. Using a dataset I'm currently working on,
and a similar setup to Paul's, with
. rocfit diagnosis logbnp1 , cont(5)
I get an ROC area of 0.738, very similar to the 0.724 obtained from -
roctab-
However,
. rocfit diagnosis bnp1, cont(5)
gives an ROC area of 0.358! -roctab- reports the same area as before,
0.724
It seems to me that the problem is that the -cut- option divides the
range of the data into more or less equal lengths, rather than into
quantiles. The result is that where the variable is very skewed, the
frequencies are skewed. Here are the frequency distributions of the
variables generated by the -cut(5)- option:
-> tabulation of cut_bnp1
cut_bnp1 | Freq. Percent Cum.
------------+-----------------------------------
1 | 109 83.85 83.85
2 | 15 11.54 95.38
3 | 3 2.31 97.69
4 | 2 1.54 99.23
5 | 1 0.77 100.00
------------+-----------------------------------
Total | 130 100.00
-> tabulation of cut_logbnp1
cut_logbnp1 | Freq. Percent Cum.
------------+-----------------------------------
1 | 24 18.46 18.46
2 | 46 35.38 53.85
3 | 54 41.54 95.38
4 | 6 4.62 100.00
------------+-----------------------------------
Total | 130 100.00
As you can see, log_bnp ended up in four groups of which three had
adequate numbers, while bnp had almost no observations in three of the
five categories. This is what we used to call a misfeature - something
that works as described in the manual, but does something that may not
be in the user's best interests. I'd suggest the addition of a -group-
option that allowed -continuous- to produce n more or less equal sized
groups.
The more alert (or anyone still reading this) will also note that -
cut(5)- produced five groups in the first instance and four in the
second. This seems to me like a bug.
This email has been cc'd to tech support!
Ronan Conroy
=================================
[email protected]
Royal College of Surgeons in Ireland
Epidemiology Department,
Beaux Lane House, Dublin 2, Ireland
+353 (0)1 402 2431
+353 (0)87 799 97 95
+353 (0)1 402 2764 (Fax - remember them?)
http://rcsi.academia.edu/RonanConroy
P Before printing, think about the environment
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/