This sounds like an application for a two-part or hurdle model:
You want to compare the frequency of below detection limit in two populations (or to a standard) and the continuous variable above the detection limit.
My inclination is NOT to use imputation - you already know these are below the detection limit, so why impute something larger than that? I've been wary of Tobit models since I read somewhere (and I don't remember where, darn it) that they are quite sensitive to the normality assumption.
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Eduardo Nunez
Sent: Tuesday, November 17, 2009 6:02 AM
To: [email protected]
Subject: st: Biomarker with lower detection limits
Dear statalisters:
I wonder if anyone can advise me on the best way to analyse continuous
variables with lower detection limits (or left censored).
In particular, I have data on a biomarker with 92% of values reported
"undetectable" and I am trying to run 2 models:
1) a linear regression using it as dependent variables, and
2) stcox with mortality as outcome and the biomarker as the main exposure.
. tab cpies_DNA_max, m
cpies_DNA |
_max | Freq. Percent Cum.
------------+-----------------------------------
0 | 121 91.67 91.67
2.85 | 1 0.76 92.42
4.721 | 1 0.76 93.18
5.059 | 1 0.76 93.94
5.165 | 1 0.76 94.70
6.267 | 1 0.76 95.45
8.009 | 1 0.76 96.21
9.965 | 1 0.76 96.97
30.538 | 1 0.76 97.73
35.137 | 1 0.76 98.48
50 | 1 0.76 99.24
71.227 | 1 0.76 100.00
------------+-----------------------------------
Total | 132 100.00
Censored values occur in enviromental, metabolomics, proteomics data
most commonly when the level of a biomarker in a sample is less than
the limit of quantification of the machine; these values are generally
reported as being less than detectable with the detection limit (DL)
being specified (for instances "< than 2.5").
There has been proposed several solutions like to replaces those
values with zeros, or DL, or DL/2 or a random value from a
distribution over the range from zero to DL. However, any of them have
been demonstrated to be optimal in simulation studies.
What I don't want is to eliminate those values and run the analysis on
complete cases.
Is it possible to use multiple imputation for replacing those values?
If this is an option, how can I tell the imputation method not to find
values bove the DL?
Is tobit an appropriate model for the fist analysis? because of marked
skewness, should I normalize the variable by transforming only the
values above DL?
Best regards,
Eduardo
Eduardo Nunez, MD, MPH
Epidemiology Department
Department of Cardiology. Hospital Clínico Universitario.Universitat de
València. València. Spain.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/