Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: Interpretation of quadratic terms
From
Michael Mitchell <[email protected]>
To
[email protected]
Subject
Re: st: RE: Interpretation of quadratic terms
Date
Wed, 10 Mar 2010 23:19:23 -0800
Dear Nick, Rodolphe, Rosie, and everyone else...
I happened to come across an example today that illustrates Nick's
point, that sometimes centering can be needed and sometimes not.
I have reproduced this using the -nlsw88.dta- data file.
First, suppose we had a variable -year-, the year the person was
born. I create that as 1968 plus the age of the person. Then I want to
predict -wage- from -year- and -year- squared.
. sysuse nlsw88, clear
(NLSW, 1988 extract)
. generate year = age + 1968
.
. regress wage c.year##c.year
Source | SS df MS Number of obs = 2246
-------------+------------------------------ F( 2, 2243) = 1.72
Model | 114.042127 2 57.0210637 Prob > F = 0.1789
Residual | 74253.9253 2243 33.1047371 R-squared = 0.0015
-------------+------------------------------ Adj R-squared = 0.0006
Total | 74367.9674 2245 33.1260434 Root MSE = 5.7537
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year | 39.17561 55.13424 0.71 0.477 -68.94386 147.2951
|
c.year#|
c.year | -.0097745 .0137323 -0.71 0.477 -.0367039 .017155
|
_cons | -39245.61 55339.8 -0.71 0.478 -147768.2 69276.95
------------------------------------------------------------------------------
The -vif- command shows **very** large VIF values.
. vif
Variable | VIF 1/VIF
-------------+----------------------
year | 1.93e+06 0.000001
c.year#|
c.year | 1.93e+06 0.000001
-------------+----------------------
Mean VIF | 1.93e+06
But, even worse, the margins command will not estimate the mean wages
for a year of 1970.
. margins , at(year=1970)
Adjusted predictions Number of obs = 2246
Model VCE : OLS
Expression : Linear prediction, predict()
at : year = 1970
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | (not estimable)
------------------------------------------------------------------------------
But, if I estimate this model using -age- instead of year, things work better.
.
. regress wage c.age##c.age
Source | SS df MS Number of obs = 2246
-------------+------------------------------ F( 2, 2243) = 1.72
Model | 114.042127 2 57.0210637 Prob > F = 0.1789
Residual | 74253.9253 2243 33.1047371 R-squared = 0.0015
-------------+------------------------------ Adj R-squared = 0.0006
Total | 74367.9674 2245 33.1260434 Root MSE = 5.7537
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .7033681 1.084471 0.65 0.517 -1.423304 2.83004
|
c.age#c.age | -.0097745 .0137323 -0.71 0.477 -.0367039 .017155
|
_cons | -4.696709 21.30931 -0.22 0.826 -46.48474 37.09133
------------------------------------------------------------------------------
The -vif- values are still large, but not as enormous as before.
. vif
Variable | VIF 1/VIF
-------------+----------------------
age | 746.80 0.001339
c.age#c.age | 746.80 0.001339
-------------+----------------------
Mean VIF | 746.80
And the -margins- command can estimate the wages for someone who is 40
years old.
. margins , at(age=40)
Adjusted predictions Number of obs = 2246
Model VCE : OLS
Expression : Linear prediction, predict()
at : age = 40
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 7.798891 .1780335 43.81 0.000 7.449951 8.14783
------------------------------------------------------------------------------
So, sometimes collinearity can be high, but we can still compute
marginal effects... in other cases, the collinearity can be so high,
that even if the regression model can be estimated, it may not be
possible to estimate marginal effects. It seems to depend on the
degree of collinearity present.
Best regards,
Michael
On Tue, Mar 9, 2010 at 1:02 PM, Nick Cox <[email protected]> wrote:
> I think you're both right. In olden days, pre-emptive centring, as we
> say in English, was a good idea in order to avoid numerical problems
> with mediocre programs that did not handle near multicollinearity well.
> Nowadays, decent programs including Stata take care that you get bitten
> as little as possible by such problems. If course, if you really do have
> multicollinearity, nothing much can help, except that Stata drops
> predictors and flags the issue.
>
> Nick
> [email protected]
>
> Rodolphe Desbordes
>
> My point is that centering does not reduce multicollinearity. As you can
> see in my example, the standard errors of the estimated marginal effects
> at the mean of `mpg' are the same using uncentered or centered values of
> `mpg'.
>
> Rosie Chen
>
> Thanks, Rodolphe, for this helpful demonstration. Agree that the major
> purpose of centering seems to be that we make the interpretation of X
> meaningful. I guess reducing multicollinearity is a bi-product of the
> benefit.
>
> Rodolphe Desbordes <[email protected]>
>
> Centering will not affect your estimates and their uncertainty. However,
> centering allows you to directly obtain the estimated effect of X on Y
> for a meaningful value of X, i.e. the mean of X.
>
> . sysuse auto.dta,clear
> (1978 Automobile Data)
>
> . gen double mpg2=mpg^2
>
> . reg price mpg mpg2
>
> Source | SS df MS Number of obs =
> 74
> -------------+------------------------------ F( 2, 71) =
> 18.28
> Model | 215835615 2 107917807 Prob > F =
> 0.0000
> Residual | 419229781 71 5904644.81 R-squared =
> 0.3399
> -------------+------------------------------ Adj R-squared =
> 0.3213
> Total | 635065396 73 8699525.97 Root MSE =
> 2429.9
>
> ------------------------------------------------------------------------
> ------
> price | Coef. Std. Err. t P>|t| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
> mpg | -1265.194 289.5443 -4.37 0.000 -1842.529
> -687.8593
> mpg2 | 21.36069 5.938885 3.60 0.001 9.518891
> 33.20249
> _cons | 22716.48 3366.577 6.75 0.000 16003.71
> 29429.24
> ------------------------------------------------------------------------
> ------
>
> . sum mpg
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> mpg | 74 21.2973 5.785503 12 41
>
> . local m=r(mean)
>
> . lincom _b[mpg]+2*_b[mpg2]*`m'
>
> ( 1) mpg + 42.59459 mpg2 = 0
>
> ------------------------------------------------------------------------
> ------
> price | Coef. Std. Err. t P>|t| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
> (1) | -355.3442 58.86205 -6.04 0.000 -472.7118
> -237.9766
> ------------------------------------------------------------------------
> ------
>
> . gen double mpgm=mpg-`m'
>
> . gen double mpgm2=mpgm^2
>
> . reg price mpgm mpgm2
>
> Source | SS df MS Number of obs =
> 74
> -------------+------------------------------ F( 2, 71) =
> 18.28
> Model | 215835615 2 107917807 Prob > F =
> 0.0000
> Residual | 419229781 71 5904644.81 R-squared =
> 0.3399
> -------------+------------------------------ Adj R-squared =
> 0.3213
> Total | 635065396 73 8699525.97 Root MSE =
> 2429.9
>
> ------------------------------------------------------------------------
> ------
> price | Coef. Std. Err. t P>|t| [95% Conf.
> Interval]
> -------------+----------------------------------------------------------
> ------
> mpgm | -355.3442 58.86205 -6.04 0.000 -472.7118
> -237.9766
> mpgm2 | 21.36069 5.938885 3.60 0.001 9.518891
> 33.20249
> _cons | 5459.933 343.8718 15.88 0.000 4774.272
> 6145.594
> ------------------------------------------------------------------------
> ------
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/