Stata has two versions of AIC statistics, one used with -glm- and
another -estat ic- The -estat ic- version does not adjust
the log-likelihood and penalty term by the number of observations in
the model, whereas the version used in -glm- does.
ESTAT-IC
AIC = -2*LL + 2*k = -2(LL-k)
GLM
AIC = -2*LL + 2*k -2(LL - k)
---------------- = --------------
n n
where LL is the model log-likelihood and k is the number of predictors.
2k is a penalty term, adjusting for the number
of predictors in the model. Larger n affects -2LL. Dividing by n
adjusts the statistic to yield a per observation contribution to the
adjusted -2*LL. That is, the version used in -glm- adjusts for sample
size.
Note that -estat ic- uses a particular ersion of the BIC statistic that
is based on the LL. The original version proposed by raftery in 1986 is
based on the deviance. -glm- uses the orignal version - hence the
descrepancy in displayed values.
Regardless, for several of my publications I developed two programs
that calculate the AIC and BIC statistic folllowing a Stata maximum
likelihood or GLM command. Look at the difference in applying the two
versions of AIC when applied to a simple logistic regression
. use auto,clear
(1978 Automobile Data)
. glm foreign mpg length, nolog fam(bin)
Generalized linear models No. of obs
= 74
Optimization : ML Residual
df = 71
Scale parameter = 1
Deviance = 60.3449833 (1/df) Deviance =
.8499293
Pearson = 54.91238538 (1/df) Pearson =
.7734139
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = ln(u/(1-u))
[Logit]
AIC
= .8965538
Log likelihood = -30.17249165 BIC
= -245.2436
-------------------------------------------------------------------------
-----
| OIM
foreign | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+-----------------------------------------------------------
-----
mpg | -.0988457 .0784404 -1.26 0.208 -.2525861
.0548946
length | -.1051447 .0295657 -3.56 0.000 -.1630923
-.047197
_cons | 20.43339 6.700286 3.05 0.002 7.301072
33.56571
-------------------------------------------------------------------------
-----
. estat ic
-------------------------------------------------------------------------
----
Model | Obs ll(null) ll(model) df AIC
BIC
-------------+-----------------------------------------------------------
----
. | 74 . -30.17249 3 66.34498
73.25718
-------------------------------------------------------------------------
----
Note: N=Obs used in calculating BIC; see [R] BIC note
. aic
AIC Statistic = .8965538 AIC*n = 66.344983
BIC Statistic = -245.2436
. abic
AIC Statistic = .8965538 AIC*n = 66.344986
BIC Statistic = .9045494 BIC(Stata) = 73.257179
** -aic- calculates both versions of AIC, and the deviance based
BIC.Note that it is consistent
to the displayed -glm- values
** -abic- gives the same two version of AIC, and the same BIC used by
-estat ic-. The BIC
on the left side is that used in LIMDEP econometric software. It
adjusts for sample size as well
. expand 2
(74 observations created)
. glm foreign mpg length, nolog fam(bin)
Generalized linear models No. of obs
= 148
Optimization : ML Residual
df = 145
Scale parameter = 1
Deviance = 120.6899666 (1/df) Deviance =
.8323446
Pearson = 109.8247708 (1/df) Pearson =
.7574122
Variance function: V(u) = u*(1-u)
[Bernoulli]
Link function : g(u) = ln(u/(1-u))
[Logit]
AIC
= .8560133
Log likelihood = -60.3449833 BIC
= -603.9058
-------------------------------------------------------------------------
-----
| OIM
foreign | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+-----------------------------------------------------------
-----
mpg | -.0988457 .0554657 -1.78 0.075 -.2075566
.0098651
length | -.1051447 .0209061 -5.03 0.000 -.1461198
-.0641695
_cons | 20.43339 4.737818 4.31 0.000 11.14744
29.71934
-------------------------------------------------------------------------
-----
. estat ic
-------------------------------------------------------------------------
----
Model | Obs ll(null) ll(model) df AIC
BIC
-------------+-----------------------------------------------------------
----
. | 148 . -60.34498 3 126.69
135.6816
-------------------------------------------------------------------------
----
Note: N=Obs used in calculating BIC; see [R] BIC note
. aic
AIC Statistic = .8560133 AIC*n = 126.68997
BIC Statistic = -603.9058
. abic
AIC Statistic = .8560133 AIC*n = 126.68996
BIC Statistic = .8600111 BIC(Stata) = 135.68161
***
Note the enlarged AIC statistic when using -estat ic- , but not when
using the
AIC used in -glm-. Also note the constancy of the Limdep BIC statistic
when the
data was expanded.
By adjusting for the number of observations in the model, the AIC can
better
be used as a comparative fit statistic, regardless if there is a
difference in sample
sizes. This was the intent of the statistic in the first place.
Also be aware that there have been other versions of the AIC. Some are
the
finite sample AIC, Swartz AIC, and Limdep AIC. Each of these has an
explicit
adjustment for sample size, unlike the version used in -estat ic-.
I discuss this topic in some detail in my new book, "Logistic
Regression Models", and provide a table
of Degrees of Model Preference based on the difference in AIC values
between 2 models. The
criteria of strength of Preference is based on simulation studies. The
table is similar to the
table developed by Raftery for his original version of BIC.
It must be understood that the penalty and observation corrections are
not completely
successful in eliminating bias resulting from additional predictors and
differences in observations.
But having an adjustment for sample size appears to me to preferable
than not. Others
developing alternatives to the traditonal AIC statistics (estata ic and
glm) seem to agree. The
primary caveat to be aware of when using AIC (glm) relates to its use
with correlated data. But
that's another discussion.
Joseph Hilbe
=========================================
ate: Tue, 23 Jun 2009 22:20:36 -0500
From: Richard Williams <[email protected]>
Subject: RE: st: Model selection using AIC/BIC and other information
criteria
At 08:39 PM 6/23/2009, kokootchke wrote:
Thank you, Richard. This was exactly what I thought... but I
remember from my metrics classes long time ago that both AIC and BIC
depend on N (sample size)... and I confirmed this by simply looking
at these wikipedia entries... but, just like you, I also feared
that, even though both criteria adjust for the sample size, maybe
you can't compare between AICs and BICs when the models use
different # of observations...
Here is a simple example that shows the sensitivity of BIC and AIC to
sample size:
. sysuse auto, clear
(1978 Automobile Data)
. quietly reg price mpg trunk weight
. estat ic
-
-------------------------------------------------------------------------
----
Model | Obs ll(null) ll(model) df AIC
BIC
-
-------------+-----------------------------------------------------------
----
. | 74 -695.7129 -682.6073 4 1373.215
1382.431
-
-------------------------------------------------------------------------
----
Note: N=Obs used in calculating BIC; see [R] BIC note
. expand 2
(74 observations created)
. quietly reg price mpg trunk weight
. estat ic
-
-------------------------------------------------------------------------
----
Model | Obs ll(null) ll(model) df AIC
BIC
-
-------------+-----------------------------------------------------------
----
. | 148 -1391.426 -1365.215 4 2738.429
2750.418
-
-------------------------------------------------------------------------
----
Note: N=Obs used in calculating BIC; see [R] BIC note
So, even if data are missing at random with your X variable, the
smaller sample sizes that result from its inclusion will drive down
the BIC and AIC stats quite a bit.
- -------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME: (574)289-5227
EMAIL: [email protected]
WWW: http://www.nd.edu/~rwilliam
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/