Thank you so much for your help.
Even though I know the DL, no idea about the biomarker distribution.
I may try all options suggested. However, I would like to know if
Stata perform a hurdle model?
Best wishes,
Eduardo
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This sounds like an application for a two-part or hurdle model:
You want to compare the frequency of below detection limit in two
populations (or to a standard) and the continuous variable above the
detection limit.
My inclination is NOT to use imputation - you already know these are
below the detection limit, so why impute something larger than that?
I've been wary of Tobit models since I read somewhere (and I don't
remember where, darn it) that they are quite sensitive to the
normality assumption.
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
On Tue, Nov 17, 2009 at 1:24 PM, Maarten buis <[email protected]> wrote:
> --- On Tue, 17/11/09, Lachenbruch, Peter wrote:
>> My inclination is NOT to use imputation - you already know
>> these are below the detection limit, so why impute something
>> larger than that?
>
> Alternatively, you could use multiple imputation, as long as
> your imputation model respects this information you have
> about your variable. This is the kind of problem Patrick
> Royston seems to had in mind when writing this update to his
> -ice- command:
>
> Patrick Royston (2007) Multiple imputation of missing values:
> further update of ice, with an emphasis on interval censoring.
> The Stata Journal, 7(4):445-464.
> http://www.stata-journal.com/article.html?article=st0067_3
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
On Tue, Nov 17, 2009 at 2:08 PM, Austin Nichols <[email protected]> wrote:
> Eduardo Nunez <[email protected]> :
> An alternative approach to consider: make a dummy variable that is one
> when your biomarker is nonzero. Then run a probit, logit, or
> alternative model using the dummy for "above the detection limit" as
> the outcome. You can also use that dummy as an explanatory variable in
> stcox, possibly also with a second variable measuring values above the
> detection limit. Do you know the detection limit? Do you know
> details about the physical process that would tell you about the
> distribution of these measurements conditional on some X? If so, you
> can write a -ml- routine using that information.
>
> On Tue, Nov 17, 2009 at 9:01 AM, Eduardo Nunez <[email protected]> wrote:
>> Dear statalisters:
>>
>> I wonder if anyone can advise me on the best way to analyse continuous
>> variables with lower detection limits (or left censored).
>> In particular, I have data on a biomarker with 92% of values reported
>> "undetectable" and I am trying to run 2 models:
>> 1) a linear regression using it as dependent variables, and
>> 2) stcox with mortality as outcome and the biomarker as the main exposure.
>>
>>
>> . tab cpies_DNA_max, m
>>
>> cpies_DNA |
>> _max | Freq. Percent Cum.
>> ------------+-----------------------------------
>> 0 | 121 91.67 91.67
>> 2.85 | 1 0.76 92.42
>> 4.721 | 1 0.76 93.18
>> 5.059 | 1 0.76 93.94
>> 5.165 | 1 0.76 94.70
>> 6.267 | 1 0.76 95.45
>> 8.009 | 1 0.76 96.21
>> 9.965 | 1 0.76 96.97
>> 30.538 | 1 0.76 97.73
>> 35.137 | 1 0.76 98.48
>> 50 | 1 0.76 99.24
>> 71.227 | 1 0.76 100.00
>> ------------+-----------------------------------
>> Total | 132 100.00
>>
>>
>> Censored values occur in enviromental, metabolomics, proteomics data
>> most commonly when the level of a biomarker in a sample is less than
>> the limit of quantification of the machine; these values are generally
>> reported as being less than detectable with the detection limit (DL)
>> being specified (for instances "< than 2.5").
>> There has been proposed several solutions like to replaces those
>> values with zeros, or DL, or DL/2 or a random value from a
>> distribution over the range from zero to DL. However, any of them have
>> been demonstrated to be optimal in simulation studies.
>> What I don't want is to eliminate those values and run the analysis on
>> complete cases.
>> Is it possible to use multiple imputation for replacing those values?
>> If this is an option, how can I tell the imputation method not to find
>> values bove the DL?
>> Is tobit an appropriate model for the fist analysis? because of marked
>> skewness, should I normalize the variable by transforming only the
>> values above DL?
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/