All,
I'm using -rndbinx- to generate synthetic datasets. However,
there seems to be a discontinuity for certain denominator values.
I have localized the problem to the following; I hope that someone
here can either determine my problem or agree there is one with
-rndbinx-:
==================================================================
. u temp, clear
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
out | 14990 .0365198 .0644039 0 1
. centile out, c(20)
-- Binom. Interp. --
Variable | Obs Percentile Centile [95% Conf. Interval]
-------------+-------------------------------------------------------------
out | 14990 20 .0105 .01 .0112
. local cut20=round(r(c_1),0.0001)
. bsample 10000
. gen denom=190
. rndbinx out denom
( Generating ................... )
Variable bnlx created.
. sum bnlx
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
bnlx | 10000 7.044 12.81637 0 190
. gen rate1=bnlx/denom
. drop bnlx
. replace denom=191
(10000 real changes made)
. rndbinx out denom
( Generating ................. )
Variable bnlx created.
. sum bnlx
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
bnlx | 10000 7.0998 12.91846 0 191
. gen rate2=bnlx/denom
. drop bnlx
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
out | 10000 .0371277 .066508 0 1
denom | 10000 191 0 191 191
rate1 | 10000 .0370737 .0674546 0 1
rate2 | 10000 .0371717 .0676359 0 1
. count if rate1<`cut20'
2191
. count if rate2<`cut20'
2945
========================================================================
The problem is apparent in the last two commands - the tail of the
distribution balloons when denominator is increased from 190 to 191.
There is nothing special about 190->191, the actual discontinuity seems
to vary with the underlying probability -out-. However, when I loop the
above over values of the denominator from say 1-500, the tail drops off
smoothly until the denominator hits a certain value (here, 191), when
it jumps up, then drops off smoothly again for another 100 values, then
jumps up again.
I'm trying to simulate the sensitivity & specificity of a classification
scheme as a function of denominator, but the result is a very unnatural
looking relationship.
Any thoughts?
thanks in advance,
Jeph
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/