The dependent variable is ln(d/p) where d=deaths and p is the
population. The log of the death rate ln(d/p) is normally distributed
with variance 1/E(d), where E(d) is the expected number of deaths. Since
E(d) is proportional to p, the variance of ln(d/p) is inversely
proportional to p, the population size which ranges in my application
from 100,000 to five million. This implies heteroskedasticity.
Observations should be weighted by the square root of the country
population. This is similar to covert OLS into GLS, weighting by the
variance inverse of the residuals.
My question is
regress ln(d/p) x1 x2 [w=pop] is that correct?
how about this alternative way to do it?
xi: regress ln(d/p) x1 x2 i.country i.year di1-di15
predict e, residual
gen esq=e^2
xi: regress esq x1 x2 i.country i.year di1-di15
predict v
xi: regress ln(d/p) x1 x2 i.country i.year di1-di15 [w=1/v]
I would appreciate any suggestions or comments