Evidently Sue wants to explain or predict variations in infant
mortality.
At a guess, the *rain* variables are powers of rainfall (likely meaning,
mean annual rainfall) which may be serving, or intended to serve, as
proxies for various direct and indirect effects of climate.
I am pretty clear as an occasional climatologist that's there no
theoretical basis [pun intended] for using a polynomial representation
here and even if there were it would not be a good idea in practice. I'd
recommend instead some more stable method, say orthogonal polynomials or
a restricted cubic spline representation. See -orthpoly- or -rcspline-.
Nick
[email protected]
Sue
I'm running the following regression:
areg infant_mort *rain* urban country ethn_rc, absorb(mother_rc)
where urban, country and ethn_rc are variables that don't vary within
mother_rc (the FE category).
My questions are:
1) since urban, country and ethn_rc don't vary within mother_rc, they
should all get dropped. However, ethn_rc gets estimated. What is odd
is that when I generate
bys mother_rc: egen ddd = mean(ethn_rc)
gen diff = ethn_rc - ddd
diff has only values of zero and it still gets estimated with fixed
effects. Again, ethn_rc is constant within mother_rc.
2) there are 4 variables in rain: long_rain1, long_rain2, long_rain3
and long_rain4.
long_rain 1 and long_rain3 are highly correlated and long_rain2 and
long_rain4 are highly correlated(0.88). My understanding was that it
shouldn't get dropped unless they are perfectly correlated. However
one variable gets dropped(long_rain4). I looked at the raw data but
the numbers are not identical. The values are very small in the range
of 0.0xxx - 0.00xxx. Could this be causing the problem?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/