Arnold kester wrote:
> If you drop observations based on their value of a predictor variable
> you are in fact changing the protocol of your study. The inclusion
> criteria are changed to include "Troponin I is detectable". Results
> would be valid for people with detectable values only.
Not realy, as long as the probability of being detectable is independent
of your dependent variable. With any type of regression you want to know
the distribution of y conditional on x: f(y|x). However you only have
information on the conditional distribution of y given x if x is
detectible (D=1): f(y|x,D=1). Using the basic rules of conditional
probability you can show that as long as the probability of observing x
does not depend on y, i.e. Pr(D=1|y,x) = Pr(D=1|x), than
f(y|x,D=1) = f(y|x).
f(y|x,D=1) = f(y,x,D=1) / f(x,D=1)
= { Pr(D=1|y,x)*f(y|x)*f(x) } /{ Pr(D=1|x)*f(x) }
= f(y|x) * Pr(D=1|y,x)/Pr(D=1|x)
If Pr(D=1|y,x) = Pr(D=1|x) than Pr(D=1|y,x) / Pr(D=1|x) =1
f(y|x,D=1) = f(y|x)*1
> If you want to get a prediction for undetectable Troponin without
> assuming a specific value you could add a dummy variable troponin_zero =
> (troponin == 0) and substitute (say) zero for log(troponin) when
> troponin==0. The predicted value from this model is independent of what
> you choose for "log(0)".
Running a regression on an `imputed' missing variables and a dummy variable
to indicate whether a variable was imputed or not will generally lead to biased
estimates.
Both points are made in one of those convenient little green sage booklets:
Paul Allison (2002) Missing Data. Tousand Oaks, Sage.
Hope this helps,
Maarten
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/