Thanks to Steven for the plug, but his recommendation of -cp- may be
problematic. My own -cp- was long since broken by Stata's own use of
-cp- as a synonym for -copy-. There is a -cpr- somewhere that gets round
that, but for the purposes here it may be better to look at my
-allpossible- or -selectvars-. Use -findit- for locations.
Steven Samuels
Cynthia:
I advise you to look at examples of negative binomial regression in a
good text. But, to briefly answer your questions:
1. Negative binomial regression -nbreg-, and its extensions -zinb-
(zero-inflated), -ztnb- (positive counts only), fit a model to the
log of the mean, not to the mean. So, the signs and relative
magnitudes of coefficients should be comparable. Wherever they
differ, I would believe the count data model. Unlike multiple
regression, -nbreg- accommodates differences in the potential size of
observations through an "exposure" or "offset" variable; a more
populous census tract would have more physicians than a smalle z r
tract, for example, so one would standardize for population size.
2. In the learning sample, you can use all of Stata's facilities for
choosing a subset of "best" models. Compare the fits of ordinary -
nbreg- to -zinb-. Check goodness of fit by using -linktest-
(especially useful with continuous variables) and by comparing the
observed to predicted counts. Use robust standard errors for
inference. Choose transformations of continuous predictors with -
fracpoly- or -mfp- . Select from "all possible combinations" of sets
of predictors with a command like Nick Cox's -cp- (available from
SSC). Compare alternative models with the BIC criterion with the -
estat- command.
3. To apply your best models to the validation sample, predict for
observations not used to create the estimates. Here's an example
sysuse auto
reg mpg weight if foreign
predict yhat if !foreign // predicts for the other observations
Search also for "esample" to see another way of getting out-of-sample
predictions.
4. Compare observed counts for your validation sample to those
predicted by the learning sample. As a measure of "closeness" you
might use a chi square statistic, divided by sample size. A rank
correlation could also work, but others may suggest better approaches.
You don't say much about your data-whether they were weighted,
clustered, or in panel form, so I haven't covered all bases. Still, I
hope this gives you a start.
-Steve
On Jun 17, 2008, at 2:20 PM, Cynthia Lokker wrote:
> Hi,
> I have a set of data with my dependant variable being a count and
> with 19
> independent variables. I originally performed a multiple regression
> on a 60%
> subset (n=757) and validated the model on the remaining 40% (n=504).
> It has since been brought to my attention to use a negative binomial
> regression since this fits my data better. I would now like to
> repeat the
> analysis and compare the general findings of the nbreg with the former
> multiple regression (magnitude of co-efficients etc).
> I have the following questions:
> 1. Is it feasible to compare (generally) the 2 types of analysis?
> 2. Can I validate my nbreg model in the same way as I did with the
> multiple
> regression?
> 3. What stata commands would I need to use to do #2?
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/