Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Inconsistent results with rocfit
From 
 
Ronan Conroy <[email protected]> 
To 
 
"[email protected]" <[email protected]> 
Subject 
 
Re: st: Inconsistent results with rocfit 
Date 
 
Tue, 2 Mar 2010 11:09:59 +0000 
On 25 Feabh 2010, at 18:30, Paul Seed wrote:
Dear Statalist,
An odd problem has come up.
I have two versions on the same predictor
(as measured & logged) , and one binary outcome.
When I use -roctab-, I get identical estimates of the ROC area.
when I use -rocfit-, I do not.
The problem is reproducible. Using a dataset I'm currently working on,  
and a similar setup to Paul's, with
. rocfit diagnosis logbnp1 , cont(5)
I get an ROC area of 0.738, very similar to the 0.724 obtained from - 
roctab-
However,
. rocfit diagnosis bnp1, cont(5)
gives an ROC area of 0.358! -roctab- reports the same area as before,  
0.724
It seems to me that the problem is that the -cut- option divides the  
range of the data into more or less equal lengths, rather than into  
quantiles. The result is that where the variable is very skewed, the  
frequencies are skewed. Here are the frequency distributions of the  
variables generated by the -cut(5)- option:
-> tabulation of cut_bnp1
   cut_bnp1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        109       83.85       83.85
          2 |         15       11.54       95.38
          3 |          3        2.31       97.69
          4 |          2        1.54       99.23
          5 |          1        0.77      100.00
------------+-----------------------------------
      Total |        130      100.00
-> tabulation of cut_logbnp1
cut_logbnp1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         24       18.46       18.46
          2 |         46       35.38       53.85
          3 |         54       41.54       95.38
          4 |          6        4.62      100.00
------------+-----------------------------------
      Total |        130      100.00
As you can see, log_bnp ended up in four groups of which three had  
adequate numbers, while bnp had almost no observations in three of the  
five categories. This is what we used to call a misfeature - something  
that works as described in the manual, but does something that may not  
be in the user's best interests. I'd suggest the addition of a -group-  
option that allowed -continuous- to produce n more or less equal sized  
groups.
The more alert (or anyone still reading this) will also note that - 
cut(5)- produced five groups in the first instance and four in the  
second. This seems to me like a bug.
This email has been cc'd to tech support!
Ronan Conroy
=================================
[email protected]
Royal College of Surgeons in Ireland
Epidemiology Department,
Beaux Lane House, Dublin 2, Ireland
+353 (0)1 402 2431
+353 (0)87 799 97 95
+353 (0)1 402 2764 (Fax - remember them?)
http://rcsi.academia.edu/RonanConroy
P    Before printing, think about the environment
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/