Kaleb Michaud <[email protected]> asks:
Is wondering about discrepencies he sees between the output of
-adjust- and -tabstat- after -logistic-. Using the census data
shipped with Stata he runs
. gen longer = length(state)
. recode longer min/8=0 9/max=1
. logistic longer medage
. predict p1 if e(sample), pr
. adjust, by(region) pr gen(a1)
----------------------------------------------------------
Dependent variable: longer Command: logistic
Created variable: a1
Variable left as is: medage
----------------------------------------------------------
----------------------
Census |
region | pr
----------+-----------
NE | .433217
N Cntrl | .37812
South | .381077
West | .339917
----------------------
Key: pr = Probability
. tabstat p1 a1 medage, by(region)
Summary statistics: mean
by categories of: region (Census region)
region | p1 a1 medage
--------+------------------------------
NE | .4335186 .4335186 31.23333
N Cntrl | .3783487 .3783487 29.525
South | .3821012 .3821012 29.61875
West | .3418868 .3418868 28.28462
--------+------------------------------
Total | .38 .38 29.54
---------------------------------------
And wonders if there is a problem since the table produced by
-adjust- shows values such as .433217 instead of .4335186 (for
the NE region, as an example).
I can explain what is happening, but first here is some
background. For logistic regression we go from the linear
prediction (the -xb- option of -predict-) to probabilities using
the formula
exp(x)/(1+exp(x))
What you are seeing in the output of -tabstat- is
average( exp(x)/(1+exp(x)) )
for each region. What you are seeing in -adjust- is
exp(average(x))/(1+exp(average(x)))
for each region.
In other words, do you want to see the average of the
probabilites, or the probabilities for the average? Depending on
which you want, you will want to use the output from -tabstat- or
-adjust-.
Just to make it more concrete, lets examine how the numbers for
NE region (.4335186 shown by -tabstat- and .433217 shown by
-adjust-) can be produced. Here is output from Stata 8 (notice
my use of the -sysuse- command which was added in 8).
. sysuse census, clear
. gen longer = length(state)
. recode longer min/8=0 9/max=1
. quietly logistic longer medage
. predict double p1 , pr
. predict double xb, xb
. summarize p1 if region==1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p1 | 9 .4335186 .0333675 .374191 .4652383
The above is the mean of the probabilities and agrees with what
-tabstat- shows.
. summarize xb if region==1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
xb | 9 -.2687384 .1370751 -.5142789 -.1392714
. di r(mean)
-.26873836
The -.26873836 is the mean of the linear predictions. The corresponding
probability for the mean prediction is
. di exp(r(mean))/(1+exp(r(mean)))
.43321685
Which is what -adjust- is producing.
I hope this clarifies the situation for you.
Ken Higbee [email protected]
StataCorp 1-800-STATAPC
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/