Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: zero-inflated analyses: when do you decide that is zero-inflated?
From
David Hoaglin <[email protected]>
To
[email protected]
Subject
Re: st: zero-inflated analyses: when do you decide that is zero-inflated?
Date
Tue, 16 Jul 2013 09:24:47 -0400
Dear Cris,
Since you have the actual size of the skin reaction, a two-part model
seems a good choice. It would be of interest to compare the result of
using actual sizes that are < 3 mm with the result of recoding those
sizes to 0. If those results differ in interesting ways, you could
see what happens with some lower thresholds than 3 mm.
In the regression part of the two-part model, you may want to consider
using a transformed scale for the size. For example, the variability
in size may be greater for larger wheals. If so, the square-root
scale or the log scale may be appropriate (either by actual
transformation or by a version of generalized linear models known as
quasi-likelihood, which can be done, as I understand it, with the
-poisson- command).
To get a graphical indication of whether a set of frequencies
resembles a Poisson distribution (or, for example, has excess zeros),
you could try the "Poisonness plot" (Hoaglin 1980, Hoaglin and Tukey
1985 --- pardon the shameless plug). The basic version would be easy
to do. The 1985 chapter discusses a similar plot for negative
binomial distributions, once one chooses a value for one of the
parameters.
David Hoaglin
Hoaglin, D.C. (1980). A Poissonness plot. The American Statistician
34:146-149.
Hoaglin D.C. and Tukey J.W. (1985). Checking the shape of discrete
distributions. In Exploring Data Tables, Trends, and Shapes (D.C.
Hoaglin, F. Mosteller, and J.W. Tukey, eds.). New York: Wiley, pp.
345-416.
On Tue, Jul 16, 2013 at 5:55 AM, Cris Dogaru (Oregon State University)
<[email protected]> wrote:
> Dear David,
> I see what you are saying, and you are actually right. Theoretically I
> can still consider it a truncated version (we could have administered
> 10 or 20 skin prick test to separate allergens), but indeed,
> conceptually my outcome is not a count variable (counting events), but
> rather a set of indicator variables for a latent construct (atopy or
> sensitization); this leaving aside that the decision for a "positive"
> test is arbitrary (skin reaction is 3mm in diameter or larger). The
> tests are indeed associated, as one would actually expect. From the
> literature (using factor analysis technique), they tend to cluster
> (indoor, outdoor, food, inhaled, etc allergens).
>
> I will settle, probably, for a two-part model, as Peter Lachenbruch
> suggests, but I will do it for each test individually, taking the
> actual size of the skin reaction, in mm. There's plenty of zeros (and
> I can recode those <3 mm to 0 as well, to stick with the commonly used
> threshold), so I will have a two-part model with a logit/regress
> combination (I can use the user-written tpm program).
>
> One of the co-authors suggested to analyze "number of positive tests",
> and that got me into the negative binomial/Poisson approaches. An
> ordinal logit model seems more appropriate indeed.
>
> Many thanks
>
> Cristian Dogaru
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/