Dear StataFolk,
We are analyzing a short (36 months) time series data set in which:
1. the dependent variable is prevalence of a contaminant in foodstuffs over
time (proportion of positive detections for all examinations each month,)
and there is no significant autocorrelation evident in the series.
2. the independent variables are a mix of continuous and dummy variables
that also vary over the time series.
Following the Stata list thread "proportion as a dependent variable" in
July, 2003 in which Roger Newson made some recommendations (July 14), we are
using -glm- with [family(binomial) link(identity) robust] to model the data.
Two questions:
1. How does one interpret the coefficients (one of the dummy variables has a
significant coefficient over 1.0)?
2. As the diagnostics with "deviance" residuals appear very strange (they
are basically clustered in three strata, with small variation within each
strata), is this an indication of a poorly fit model or should we be using
Pearson or other residuals?
This, of course, opens the wider question of how to perform diagnostics when
forcing continuous data such as proportions into a binomial family model.
Would the experts be able to offer advice?
We're using Stata SE 8.1.
Many thanks,
Steve Rothenberg
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/