I have two variables,
(1) outagecost (estimated costs to each customer of a short electrical
power interuuption)
(2) mwhannual (annual megawatt hours of electricity consumption fpr each
customer)
Since these variables appear approximately lognormal, I have been
estimating the following simple model:
reg lnoutagecost lnmwhannual
where lnoutagecost and lnmwhannual represent the natural log of the two
variables desribed above. The results are:
. reg lnoutagecost lnmwhannual
Source | SS df MS Number of obs =
32345
-------------+------------------------------ F( 1, 32343) =
9370.20
Model | 34151.9301 1 34151.9301 Prob > F =
0.0000
Residual | 117881.722 32343 3.6447368 R-squared =
0.2246
-------------+------------------------------ Adj R-squared =
0.2246
Total | 152033.652 32344 4.70052104 Root MSE =
1.9091
------------------------------------------------------------------------
----
lnoutagecost | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------
----
lnmwhannual | .3824726 .0039512 96.80 0.000 .3747282
.3902171
_cons | 5.370938 .0232302 231.21 0.000 5.325406
5.41647
------------------------------------------------------------------------
----
I then tried the following model in glm which I had expected to produce
identical results:
glm outagecost lnmwhannual, link(log)
Generalized linear models No. of obs =
52418
Optimization : ML Residual df =
52416
Scale parameter =
7.59e+09
Deviance = 3.97873e+14 (1/df) Deviance =
7.59e+09
Pearson = 3.97873e+14 (1/df) Pearson =
7.59e+09
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = ln(u) [Log]
AIC =
25.5881
Log likelihood = -670636.5416 BIC =
3.98e+14
------------------------------------------------------------------------
----
| OIM
outagecost | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------
----
lnmwhannual | .5568004 .0130092 42.80 0.000 .5313029
.5822979
_cons | 5.355758 .1384432 38.69 0.000 5.084414
5.627102
------------------------------------------------------------------------
----
Obviously the results are very similar, but not identical.
I read the Stata Manual section on GLM and checked a large number of
posts on Statalist related to loglinear models, but I was not able to
understand exactly why glm using link(log) doesn't produce the same
results as logging both variables and using reg. Based on my reading
of the Stata manual it appears to have someing to do with the fact that
the link() option relates to the expectation od the dependent variable,
not the dependent variable itself. Can anyone tell me why the results
are different?
Matthew G. Mercurio, Ph.D.
Senior Consultant
Freeman, Sullivan & Co.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/