Joe,
I believe everything depends on whether your Poisson model includes an
offset. This is explained in the help system if you type
help poisson postestimation, marker(predict)
If the model doesn't have an offset, then |predict ir, ir| gives the fitted
count. This is the same as |predict mu| (with no options), and the same as
|gen mu = exp(xb)| after |predict xb, xb|.
If the model has an offset, then |predict ir, ir| gives the predicted rate,
which is the number of events divided by exposure. This is not the same as
|predict mu|, which gives the expected count taking exposure into account.
Note also that |predict xb, xb| gives the linear predictor *including* the
offset. Another way to obtain the incidence rate is |gen rate =
events/exposure| after |predict events|, assuming your exposure variable is
called exposure.
In either case it is possible for the rate to exceed one; you could expect
more than one event (no offset) or more than one event per unit of exposure
(with offset). For example the number of children ever born per woman can be
more than one.
Cheers,
German
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Friday, October 07, 2005 12:10 PM
To: [email protected]
Subject: st: Re: predict after Poisson
A couple of days ago Rich Goldstein posted a note concerning values greater
than 1 when using the -predict x, ir- command following a Poisson
regression.
He wondered if values over 1 were correct. I checked the output of several
of the poisson postestimation predict options and discovered that there may
be
a problem with Stata's output. I used a user written program that I wrote
to
double check results, but the findings could be obtained without the check.
Rich responded that the output he obtains after the use of -predict, ir-
following -poisson- is different than I get when using -poisson-. This is
more
disturbing. Not only does there appear to be a possible inconsistency in
predict's output following the poisson command, it seems that predict
output
differs between Rich's and my programs. I am using version 9.1, I do not
know
which version Rich is using. But it should not make a difference.
Perhaps someone else has a solution to our findings, showing that there
really is no inconsistency. With Rich's OK, our correspondence is listed
below.
Joe Hilbe
=======
Rich:
I looked at the problem you identified. First I checked the predict
help file for poisson postestimations. It shows the options for predict
as below. I used the cancer data set, modeled it, and employed the
predict command with the n, ir, and xb options. I also simply used the
command -predict mu-. Thereafter I list the predictions for the first 5
observations. Note that n, ir, and mu are all the same, which is incorrect.
n and ir should differ. I next modeled the same with the poisson program
I wrote to use with my short courses. I like to at least have the AIC
statistic
displayed so that I can more easily compare models. Anyhow, the stats come
out
correctly. The base predict command should give the linear predictor, with
the
inverse link providing the fitted value. This is the case of all ML
programs. Of course,
the predict command after poisson has been given different, and more
extensive,
options.
The fitted value, mu, with my program, matched the values of n, ir, and mu
for
the use of predict following Stata's -poisson- command. Both -xb-'s were
the same - as they should be.
My original thought about the -ir- option yielding the fitted value was
correct.
That's why you obtained values in excess of 1.0, which was to be expected.
Perhaps Stata's tech support needs to look at the Poisson postestimation
commands.
It would be helpful if someone reading this try out the model and the
predict commands to
see if their results are the same.
My output is below: Rich's final remark of this morning is below that.
Joe Hilbe
000000000000000000000000000000000000000000000000000000000000000000000000
help poisson postestimation
..
Syntax for predict
predict [type] newvar [if] [in] [, statistic nooffset]
statistic description
----------------------------------------------------------------------
Main
n predicted number of events; the default
ir incidence rate exp(xb)
xb linear prediction
stdp standard error of the linear prediction
score first derivative of the log likelihood with respect to xb
----------------------------------------------------------------------
STATA'S POISSON MODEL AND POSTESTIMATION
========================================
. use cancer
. poisson time died age d2 d3, nolog irr
Poisson regression Number of obs =
48
LR chi2(4) = 160.82
Prob > chi2 = 0.0000
Log likelihood = -188.28729 Pseudo R2 =
0.2993
----------------------------------------------------------------------------
--
time | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+--------------------------------------------------------------
--
died | .9237044 .081715 -0.90 0.370 .7766618 1.098586
age | .9700189 .0071361 -4.14 0.000 .9561329 .9841067
d2 | 1.618461 .184698 4.22 0.000 1.294087 2.024142
d3 | 2.586837 .2674132 9.19 0.000 2.112402 3.167829
----------------------------------------------------------------------------
--
. predict cnt, n
. predict ircnt, ir
. predict mu
(option n assumed; predicted number of events)
. predict xb, xb
. l cnt ircnt mu xb in 1/5
+-------------------------------------------+
| cnt ircnt mu xb |
|-------------------------------------------|
1. | 7.609622 7.609622 7.609622 2.029413 |
2. | 6.737269 6.737269 6.737269 1.907655 |
3. | 8.087283 8.087283 8.087283 2.090293 |
4. | 10.00786 10.00786 10.00786 2.30337 |
5. | 8.860577 8.860577 8.860577 2.181612 |
+-------------------------------------------+
USER WRITTEN POISSON PROGRAM, VERSION 9.1
======================================================
. drop cnt ircnt mu xb
. jhpoisson time died age d2 d3, nolog irr
Poisson Regression Number of obs =
48
Wald chi2(4) = 154.82
Log likelihood = -188.28729 Prob > chi2 =
0.0000
----------------------------------------------------------------------------
--
time | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+--------------------------------------------------------------
--
died | .9237044 .081715 -0.90 0.370 .7766618 1.098586
age | .9700189 .0071361 -4.14 0.000 .9561329 .9841067
d2 | 1.618461 .184698 4.22 0.000 1.294087 2.024142
d3 | 2.586837 .2674132 9.19 0.000 2.112402 3.167829
----------------------------------------------------------------------------
--
AIC Statistic = 8.054 BIC Statistic =
3.186
Deviance = 169.648 Dispersion =
3.945
LM Value = 6348.667 LM Chi2(1) =
0.000
Score test OD = 123.055 Score Chi(1) =
0.000
. predict xb, xb
. gen mu=exp(xb)
. l xb mu in 1/5
+---------------------+
| xb mu |
|---------------------|
1. | 2.029413 7.609622 |
2. | 1.907655 6.73727 |
3. | 2.090293 8.087282 |
4. | 2.30337 10.00786 |
5. | 2.181612 8.860576 |
+---------------------+
NOTE: Our xb and mu results are identical. The question remains about the
values following
-predict xn, n- and -predict xir, ir-, which are the same as -mu- when I
model it with -poisson-.
000000000000000000000000000000000000000000000000000000
Joe,
Note that in my case I did NOT get the same predictions using:
predict newvar
and
predict newvar, ir.
When using no option, the max predicted value was just over 9. When using
the
ir option, the max predicted value was about 1.13.
Rich
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/