moleps islon <[email protected]> :
Just to be clear: B causes Z and B causes A, but you don't observe B,
right? Let's ignore the survival model you are no doubt estimating,
and suppose you have gotten an estimate of P(Z|A)=.05 with a SE near
zero (a confidence interval of width zero). Now you want to estimate
P(Z|B) and P(A|B), and you think P(Z|B) is near .65 and
P(Z|~B)=6/100000 (I assume "background incidence" is the probability
of Z given not B here; that may reflect my "background ignorance").
You will need much more information to make any progress!
Let p=P(B) in the population, y=P(Z|B), x=P(A|B), and w=P(A|~B). Note
that ~B means "not B" or B==0. Then
P(Z|A)=P(Z|B)P(B|A)+P(Z|~B)P(~B|A)=[ypx+.00006(1-p)w]/[(1-p)w+px]
so even if you assume P(Z|A)=.05 and y=.65, you have 3 unknowns and 1
equation; even if you know p, you have two unknowns w and x, so the
best you can hope for is to express P(A|B) as a linear function of
P(A|~B). For example, if p=.5 and y=.65 and P(Z|A)=.05 then w is 12
times as big as x (i.e. if Z is so rare in a sample of A, when B so
likely causes Z, it must be because A is much more likely when not B
than when B). If p is 8% then w and x are roughly the same. I
suggest you draw out a couple of trees with probabilities and check my
math.
If you want to estimate y and x, you are out of luck. If you know w
and p with certainty, you can express y as a function of x and the
estimate of P(Z|A), so if you have estimates of P(Z|A) in memory, you
can use -lincom- to get estimates of y conditional on x, but how
plausible is it you would know w with certainty when you are trying to
estimate x and y?
I suppose you could use known p, estimates of P(Z|A) in memory, and
-lincom-, to get estimates of y conditional on x and w, then present a
table of point estimates and confidence intervals for various values
of x and w. Or get estimates of x conditional on y and w, or what
have you. But you still have to assume you know p with certainty, or
the dimension of that table gets out of control...
I have been assuming that P(Z|A) is what you are estimating, but you
really have a competing risk model, I am guessing, modeling the hazard
of getting Z before death or censoring by some other process. So you
need to redefine Z to be not "gets condition Z" but "gets condition Z
in my observation period" to use any of the above, which is probably
unpalatable. Plus, I don't know if I've translated your description
into probabilities correctly--the jargon of genetics is unfamiliar to
me (and many other list members--you should translate to the common
language of statistics).
On Sun, Mar 22, 2009 at 10:37 AM, moleps islon <[email protected]> wrote:
> Dear statalisters,
> I'm studying a tumor A that has a probability (x) of a being linked to
> a genetic mutation (B) that also predisposes (penetrance approx 65%(y)
> by 70 years) to condition Z. Now I've got 217 cases of A that resulted
> in 11 cases of Z over 8534 years of followup years (among the 217
> cases). I need to determine the number of patients with B given that
> there is also a background incidence of 6/100000 for Z.We know that
> x<<y. Besides running a simulation is there a more analytical way of
> estimating x and y given my data???
>
> Best wishes,
> Moleps
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/