Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Gillian.Frost@hsl.gov.uk |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: Interval regression with skewed data |
Date | Fri, 13 Jan 2012 09:58:26 +0000 |
Hello Ronán, Thank you for your informative reply. I think that your approach to dealing with zeros seems very sensible, and I will look into using this. It is also reassuring to know that predictions from -intreg- seem reasonable. Thank you for your help. Gillian From: Ronan Conroy <rconroy@rcsi.ie> To: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> Date: 12/01/2012 16:49 Subject: Re: st: RE: Interval regression with skewed data Sent by: owner-statalist@hsphsun2.harvard.edu On 2012 Ean 10, at 08:40, <Gillian.Frost@hsl.gov.uk> <Gillian.Frost@hsl.gov.uk> wrote: > Nick, I apologise for not being clear in my original posting. My > outcome/dependent variable is the number of colony forming units per ml, > and my predictor/independent variable is the region (North West, North > East, South East England,...) within which the sample was taken. The approach I use is to express the colony forming units (CFU) as log10 units. I do work with rather contaminated samples, but the approach may well work well in your case. The problem of zero having no log is resolved when you note that a zero reading means no CFU were detected in 100 ml of sampled water; it does not mean that the water contains no bacteria. For this reason, I define zero CFU as having an upper limit of log10(1) and no lower limit (.). Likewise, though you may not have seen them, you will get water samples where the CFU are too numerous to count, and these can be treated likewise. Unlike you, I have worked on datasets in which 40% of the data were so contaminated that the bugs were too numerous to count - and a lot of the world's population is still reliant on water like that!