Eduardo Nunez <[email protected]> :
An alternative approach to consider: make a dummy variable that is one
when your biomarker is nonzero. Then run a probit, logit, or
alternative model using the dummy for "above the detection limit" as
the outcome. You can also use that dummy as an explanatory variable in
stcox, possibly also with a second variable measuring values above the
detection limit. Do you know the detection limit? Do you know
details about the physical process that would tell you about the
distribution of these measurements conditional on some X? If so, you
can write a -ml- routine using that information.
On Tue, Nov 17, 2009 at 9:01 AM, Eduardo Nunez <[email protected]> wrote:
> Dear statalisters:
>
> I wonder if anyone can advise me on the best way to analyse continuous
> variables with lower detection limits (or left censored).
> In particular, I have data on a biomarker with 92% of values reported
> "undetectable" and I am trying to run 2 models:
> 1) a linear regression using it as dependent variables, and
> 2) stcox with mortality as outcome and the biomarker as the main exposure.
>
>
> . tab cpies_DNA_max, m
>
> cpies_DNA |
> _max | Freq. Percent Cum.
> ------------+-----------------------------------
> 0 | 121 91.67 91.67
> 2.85 | 1 0.76 92.42
> 4.721 | 1 0.76 93.18
> 5.059 | 1 0.76 93.94
> 5.165 | 1 0.76 94.70
> 6.267 | 1 0.76 95.45
> 8.009 | 1 0.76 96.21
> 9.965 | 1 0.76 96.97
> 30.538 | 1 0.76 97.73
> 35.137 | 1 0.76 98.48
> 50 | 1 0.76 99.24
> 71.227 | 1 0.76 100.00
> ------------+-----------------------------------
> Total | 132 100.00
>
>
> Censored values occur in enviromental, metabolomics, proteomics data
> most commonly when the level of a biomarker in a sample is less than
> the limit of quantification of the machine; these values are generally
> reported as being less than detectable with the detection limit (DL)
> being specified (for instances "< than 2.5").
> There has been proposed several solutions like to replaces those
> values with zeros, or DL, or DL/2 or a random value from a
> distribution over the range from zero to DL. However, any of them have
> been demonstrated to be optimal in simulation studies.
> What I don't want is to eliminate those values and run the analysis on
> complete cases.
> Is it possible to use multiple imputation for replacing those values?
> If this is an option, how can I tell the imputation method not to find
> values bove the DL?
> Is tobit an appropriate model for the fist analysis? because of marked
> skewness, should I normalize the variable by transforming only the
> values above DL?
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/