The issue as I understand it for response y arises because the mean of
log(y) differs from the log of mean(y). What you do to the predictors is
immaterial. The problem is generic to any nonlinear transformation.
I see there being two main relatively simple ways of tackling this
problem. (There are other more complicated methods; my experience, such
as it is, indicates that they don't give very different results except
when results are highly dubious anyway.)
1. Avoid it altogether by using -glm- with appropriate link.
2. Use smearing.
Richard Goldstein implemented -predlog- in 1996, which includes
smearing.
STB-29 sg48 . Predictions in the original metric for log-transformed
models
(help predlog if installed) . . . . . . . . . . . . . . . R.
Goldstein
1/96 pp.27--29; STB Reprints Vol 5, pp.145--147
calculates three different retransformations, which allow
obtaining predictions in the original metric
Both the software and the original article are accessible to all.
You can almost do smearing by hand, but here is a slightly more polished
version of doing it by hand.
*! NJC 2.1.0 8 January 2005
* NJC 1.0.0 13 September 2002
program smear, rclass
version 8.0
syntax [if] [in] [, Generate(str) OUTofsample ]
if "`generate'" != "" {
capture confirm new variable `generate'
if _rc {
di as err "option syntax is generate(newvar)"
exit _rc
}
}
marksample touse
qui count if `touse'
if r(N) == 0 error 2000
tempvar resid yhatraw
tempname rmse cf
qui {
* will exit with error message if no estimates
scalar `rmse' = e(rmse)
if "`outofsample'" != "" predict double `yhatraw'
else predict double `yhatraw' if e(sample)
predict double `resid', res
replace `resid' = exp(`resid')
su `resid', meanonly
scalar `cf' = r(mean)
if "`generate'" != "" {
gen double `generate' = exp(`yhatraw') * `cf' if
`touse'
la var `generate' "smeared retransformation"
}
}
di as res scalar(`cf')
return scalar smearcf = `cf'
end
There is more discussion in
N.J. Cox, J. Warburton, A. Armstrong and V.J. Holliday. 2008. Fitting
concentration and load rating curves with generalised linear models.
Earth Surface Processes and Landforms 33: 25-39 (doi: 10.1002/esp.1523)
which may be accessible to you.
Nick
[email protected]
Maarten buis
--- "Loncar, Dejan" <[email protected]> wrote:
> I have transformed the variables using log function before
> regression.
>
> Do you know by any chance which function in Stata or some ado file
> can perform antilog transformation after regression with correction
> for bias in regression estimates?
Bias means nothing else than that your estimates don't mean what you
think they mean. So there are two ways of addressing bias: Either you
change interpretation of the results so that the interpretation
corresponds to the estimate, or you change your estimate so that it
measures what you think it does. Another consequence of this is that
there is no such thing as a biased estimate perse: you always need to
specify what the estimate is a biased estimate of. Trivially all
estimates are biased estimates of most concepts (e.g. the annual tea
consumption of Burundi is a biased estimate of the number of ants per
square inch in Amsterdam), and at the same time all estimates are
unbiased estimates of the thing that they measure (but the thing they
measure may not be of interest).
The distinction between changing the interpretation and changing the
estimate is nicely illustrated by looking at a log transformed
dependent variable. If you fist transform the dependent variable and
than perform a regular regression you can interpret the exponentiated
coefficients as ratios of geometric means, but not as ratios of
arithmatic means. You can get estimates in terms of ratios of
arithmatic means when you use -glm- on the untransformed dependent
variable with -link(log)- option. So if you are interested in the
effect on the geometric mean, then -glm- will provide you with biased
estimates. You can solve this either by changing your interpretation of
the results to the effect in terms of the arithmatic mean or by
estimating your model with -regress-.
I have discussed a detailed example of this issue here:
http://www.stata.com/statalist/archive/2008-11/msg00137.html
Also see:
Roger Newson (2003) Stata tip 1: The eform() option of regress. The
Stata Journal 3(4): 445.
http://stata-journal.com/article.html?article=st0054
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/