Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Standardized interaction terms - which p-values hold?
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Standardized interaction terms - which p-values hold?
Date
Tue, 15 Jan 2013 17:11:42 +0000
I may well be missing something but for such questions I always want
to hear what is the origin of the coordinate system, i.e. where will
the interaction term be zero?
Forget about the standard deviations, but focus on working with
deviations, variables MINUS their means.
Suppose z depends on x and y. Then a term xy implies an origin x = 0, y = 0.
However with new X = x - its mean, Y = y - its mean, XY implies an
origin which in the original units is x = its mean, y = its mean.
That's a quite different model, so a different P-value is not a
puzzle; it's expected.
Nick
On Tue, Jan 15, 2013 at 4:59 PM, Elisabeth Bublitz
<[email protected]> wrote:
> Thanks for you answers! I do agree that it does not make sense to
> standardize the variables twice. In fact, using Example 1 this would be:
>
> *********
> reg smpg shead slength ia2 // (A) new suggestion
> reg smpg shead slength sia2 // previous suggestion (standardized
> interaction)
>
> *********
>
> It shows that it makes no difference what you do once you standardize the
> variables before forming an interaction. They yield identical results.
>
> This leaves one more question open: How should you handle changing p-values?
> The so far prefered option (A) returns something different than a regression
> with unstandardized variables (B).
>
> From what I know, a standardization of variables should only change
> coefficients not p-values. To get identical p-values as in the baseline
> regression (with unstandardized variables) it is necessary to leave
> variables unstandardized before creating the interaction. If I then
> standardize the interaction term, it gives identical p-values (C).
>
> For illustration I follow the examples from before again:
>
> **********
> egen sia1 = std(ia)
> reg mpg head length ia // (B) baseline
> reg smpg shead slength sia1 // (C) same p-values with standardized
> interaction but unstandardized variables
> **********
>
> Obviously, the interpretation in the examples does not makes sense but they
> just serve as an illustration. But in general I'm wondering what might be
> going on.
>
> -Elisabeth
>
> Am 15.01.2013 17:24, schrieb Jeffrey Wooldridge:
>
>> For what it's worth, I agree with Joerg. I don't see that
>> standardizing the interaction makes sense; nor does it solve a
>> substantive problem. Centering the variables before interacting them
>> often does, but that's because it forces the coefficients on the level
>> variables to be interpreted as marginal effects at the means of the
>> covariates. This often does make more sense than the partial effects
>> at zero. For example, what sense would it make to estimate the effects
>> of headroom on mpg for a car with length = 0?
>>
>> In your example, I assume the variables are rates at something like
>> the county level. But it still would make no sense to evaluate the
>> partial effect of death -- whether it is standardized or not -- at
>> medage = 0.
>>
>> On Tue, Jan 15, 2013 at 11:13 AM, Joerg Luedicke
>> <[email protected]> wrote:
>>>
>>> In your two examples, you are comparing apples and oranges. If you
>>> center your variables in example 1 such that their mean is zero, you
>>> should get the same results as in example 2. However, I would not
>>> standardize the interaction term itself because it does not seem to be
>>> very meaningful. If the two predictors are standardized, then their
>>> interaction shows the effect of one predictor on the effect of the
>>> other in standard deviation unit. If the interaction term itself is
>>> standardized (or if you calculate a standardized coefficient) you
>>> can't interpret it that way.
>>>
>>> Joerg
>>>
>>> On Tue, Jan 15, 2013 at 10:01 AM, Elisabeth Bublitz
>>> <[email protected]> wrote:
>>>>
>>>> Hi Statalist,
>>>>
>>>> when I compare the p-values of a baseline regression with those obtained
>>>> from a regression with standardized coefficients and interaction terms
>>>> the
>>>> following problem comes up: The suggestions previously posted (see,
>>>> http://www.stata.com/statalist/archive/2009-04/msg00888.html) are that
>>>> the
>>>> variables forming the interaction need to be standardized before they
>>>> are
>>>> interacted, and a second time afterwards. This changes the p-values and
>>>> sometimes even coefficients change their signs. Intuitively this
>>>> suggests to
>>>> me that something with the previous suggestion is not correct.
>>>>
>>>>
>>>> Here is the example from the previous thread:
>>>> *-------------------Example 1--------------------------------
>>>> * This version standardizes the IA once and serves as an example of what
>>>> is
>>>> "incorrect"
>>>> sysuse auto, clear
>>>> gen ia = head*length
>>>> reg mpg head length ia, beta
>>>>
>>>> * This version standardizes the IA twice and is suggested to be
>>>> "correct"
>>>> egen shead = std(headroom)
>>>> egen slength = std(length)
>>>> egen smpg = std(mpg)
>>>> gen ia2 = shead*slength
>>>> egen sia2 = std(ia2)
>>>> reg smpg shead slength sia2
>>>> *-----------------------------------------------------------------
>>>>
>>>>
>>>> In this example the changes are visible but do not yet cross important
>>>> levels, therefore significance levels stay the same. This is, however,
>>>> different for the data I use. I'd be curious to learn what you think
>>>> about
>>>> this.
>>>>
>>>> I found an example where the changes are more visible.
>>>>
>>>> *-------------------Example 2--------------------------------
>>>> sysuse census, clear
>>>>
>>>> * Standardizing coefficients
>>>> egen zdivorce = std(divorce)
>>>> egen zmarriage = std(marriage)
>>>> egen zdeath = std(death)
>>>> egen zmedage = std(medage)
>>>>
>>>> * Interaction terms
>>>> gen ia= death*medage
>>>> egen zia_1= std(ia)
>>>> gen test = zdeath*zmedage
>>>> egen zia_2 = std(test)
>>>>
>>>> * Regression
>>>> reg divorce marriage death medage ia, beta //(1) this follows the
>>>> simpler
>>>> procedure
>>>> reg divorce marriage death medage test, beta //(2) this standardizes the
>>>> IA
>>>> twice, note changes in significance levels and coefficient size
>>>> reg zdivorce zmarriage zdeath zmedage zia_2 // (3) for comparison
>>>> (identical
>>>> with (2)): this is the same as suggested in the previous thread
>>>> *-----------------------------------------------------------------
>>>>
>>>> Unfortunately, I need to compare the size of two interactions and, thus,
>>>> need standardized coefficients. If you have other suggestions, let me
>>>> know.
>>>> I was wondering whether it would make sense to use logarithms instead.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/