Stata 11 introduced the margins command, which superseded adjust.
Title | Producing adjusted means after ANOVA | |
Author | Kenneth Higbee, StataCorp |
Someone posed the following question:
I am running some simple ANOVAs and wanted also to produce the adjusted means. The command is a 3-way ANOVA with a single 2-way interaction. All the predictors are dichotomous (0/1) variables. There were a few problems with the output.
First, when I run
. anova opportu2 volsex frcsex volsex*frcsex q3
and then
. adjust q3 if e(sample), by(volsex frcsex) se ci
I get a table with all cells missing.
I then decided to run
. anova opportu2 volsex frcsex volsex*frcsex q3, cont(q3) . adjust q3 if e(sample), by(volsex frcsex) se ci
This did work! What is going on?
The result you show, comparing when q3 was used as a categorical variable and when it was specified to be a continuous variable in the ANOVA, does not surprise me. Let me explain why using the auto data.
. sysuse auto (1978 Automobile Data) . gen z = trunk < 14 . anova wei rep for rep*for z Number of obs = 69 R-squared = 0.6079 Root MSE = 528.54 Adj R-squared = 0.5556 Source | Partial SS df MS F Prob > F --------------+---------------------------------------------------- Model | 25984464.5 8 3248058.07 11.63 0.0000 | rep78 | 1524294.09 4 381073.522 1.36 0.2571 foreign | 3521325.49 1 3521325.49 12.61 0.0008 rep78*foreign | 2300624.62 2 1150312.31 4.12 0.0211 z | 3248513.88 1 3248513.88 11.63 0.0012 | Residual | 16761251.4 60 279354.19 --------------+---------------------------------------------------- Total | 42745715.9 68 628613.47
I used the 0/1 variable z as a categorical variable in the anova above.
Now, just like you experienced, when I use adjust to adjust to the MEAN of z, I get nothing useful.
. adjust z if e(sample), by(for rep) se ci ---------------------------------------------------------------------------- Dependent variable: weight Command: anova Covariate set to mean: z = .43478259 ---------------------------------------------------------------------------- (8 missing values generated) (8 missing values generated) ---------------------------------------- | Repair Record 1978 Car type | 1 2 3 4 5 ----------+----------------------------- Domestic | | | | Foreign | | | ---------------------------------------- Key: Linear Prediction (Standard Error) [95% Confidence Interval]
Notice the messages about missing values generated.
Think about the parameterization used by ANOVA models. Categorical variables enter the design matrix as a set of indicator (also called dummy) variables. For instance, rep78, which has five levels, becomes 5 columns in the ANOVA design matrix (one of these levels will later be dropped in the estimation process since there are only 4 degrees of freedom for the 5 levels. foreign becomes 2 columns in the design matrix, and one of them will be dropped later in the estimation process. The same is true for the z variable.
Here is a look at the underlying regression for the ANOVA above:
. regress Source | SS df MS Number of obs = 69 -------------+------------------------------ F( 8, 60) = 11.63 Model | 25984464.5 8 3248058.07 Prob > F = 0.0000 Residual | 16761251.4 60 279354.19 R-squared = 0.6079 -------------+------------------------------ Adj R-squared = 0.5556 Total | 42745715.9 68 628613.47 Root MSE = 528.54 ------------------------------------------------------------------------------ weight Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------------------------------------------------------------ _cons 2182.261 187.7289 11.62 0.000 1806.748 2557.775 rep78 1 1140 528.5397 2.16 0.035 82.76324 2197.237 2 1020.691 431.9311 2.36 0.021 156.7003 1884.682 3 -338.0654 352.7323 -0.96 0.342 -1043.635 367.5043 4 -85.01959 251.2557 -0.34 0.736 -587.6057 417.5666 5 (dropped) foreign 1 -222.2614 418.2335 -0.53 0.597 -1058.853 614.3301 2 (dropped) z 1 497.4118 145.8651 3.41 0.001 205.6382 789.1855 2 (dropped) rep78*foreign 1 1 (dropped) 2 1 (dropped) 3 1 1470.257 536.9423 2.74 0.008 396.2125 2544.301 3 2 (dropped) 4 1 1270.366 504.0553 2.52 0.014 262.1051 2278.627 4 2 (dropped) 5 1 (dropped) 5 2 (dropped) ------------------------------------------------------------------------------
predict produces missing values when asked to produce predictions for these 10 points. It does this because since z entered the ANOVA model as a categorical variable with 0 and 1 as the valid values of z, having z = .43478259 doesn’t correspond to either 0 or 1.
predict would, for instance, also produce a missing value if you asked for a prediction when rep78 = 3.257, rep78 = 12, etc. After anova, the only valid values for categorical variables for predict are those values present in the ANOVA.
Now watch what happens when I do the following adjust:
. adjust z=0 if e(sample), by(rep for) se ci ---------------------------------------------------------------------------- Dependent variable: weight Command: anova Covariate set to value: z = 0 ---------------------------------------------------------------------------- ------------------------------------------------ Repair | Record | Car type 1978 | Domestic Foreign ----------+------------------------------------- 1 | 3597.41 | (401.19) | [2794.91,4399.91] | 2 | 3478.1 | (190.392) | [3097.26,3858.94] | 3 | 3589.6 2341.61 | (110.519) (320.272) | [3368.53,3810.67] [1700.97,2982.25] | 4 | 3642.76 2594.65 | (179.137) (209.548) | [3284.43,4001.09] [2175.5,3013.81] | 5 | 2457.41 2679.67 | (401.19) (193.923) | [1654.91,3259.91] [2291.77,3067.58] ------------------------------------------------ Key: Linear Prediction (Standard Error) [95% Confidence Interval]
The answers I got are the adjusted predictions when z is 0. I could also get predictions when z is 1 with
. adjust z=1 if e(sample), by(rep for) se ci output omitted
If I ask for predictions at any other values of z besides 0 or 1, I will get missing values from the predictions.
Now, instead of having z enter the anova model as a categorical variable, you instead send it in as a continuous variable (a covariate in an ANCOVA).
. anova wei rep for rep*for z, cont(z) Number of obs = 69 R-squared = 0.6079 Root MSE = 528.54 Adj R-squared = 0.5556 Source | Partial SS df MS F Prob > F --------------+---------------------------------------------------- Model | 25984464.5 8 3248058.07 11.63 0.0000 | rep78 | 1524294.09 4 381073.522 1.36 0.2571 foreign | 3521325.49 1 3521325.49 12.61 0.0008 rep78*foreign | 2300624.62 2 1150312.31 4.12 0.0211 z | 3248513.88 1 3248513.88 11.63 0.0012 | Residual | 16761251.4 60 279354.19 --------------+---------------------------------------------------- Total | 42745715.9 68 628613.47
The ANOVA table looks the same, but the underlying representation is different. There is one less column in the design matrix. z only has one column instead of two (corresponding to z=0 and z=1). Instead, since z is “continuous”, the ANOVA is happy with whatever values might happen to be in z. Since z had only two levels (0 and 1), the resulting ANOVA table is identical. This would not be true if z had 3 or more levels. Then the first anova would have had more degrees of freedom for the z, while the second anova would continue to have only 1 degree of freedom.
Here is the underlying regression for the ANOVA above:
. regress Source | SS df MS Number of obs = 69 -------------+------------------------------ F( 8, 60) = 11.63 Model | 25984464.5 8 3248058.07 Prob > F = 0.0000 Residual | 16761251.4 60 279354.19 R-squared = 0.6079 -------------+------------------------------ Adj R-squared = 0.5556 Total | 42745715.9 68 628613.47 Root MSE = 528.54 ------------------------------------------------------------------------------ weight Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------------------------------------------------------------------ _cons 2679.673 193.9232 13.82 0.000 2291.769 3067.577 rep78 1 1140 528.5397 2.16 0.035 82.76324 2197.237 2 1020.691 431.9311 2.36 0.021 156.7003 1884.682 3 -338.0654 352.7323 -0.96 0.342 -1043.635 367.5043 4 -85.01959 251.2557 -0.34 0.736 -587.6057 417.5666 5 (dropped) foreign 1 -222.2614 418.2335 -0.53 0.597 -1058.853 614.3301 2 (dropped) z -497.4118 145.8651 -3.41 0.001 -789.1855 -205.6382 rep78*foreign 1 1 (dropped) 2 1 (dropped) 3 1 1470.257 536.9423 2.74 0.008 396.2125 2544.301 3 2 (dropped) 4 1 1270.366 504.0553 2.52 0.014 262.1051 2278.627 4 2 (dropped) 5 1 (dropped) 5 2 (dropped) ------------------------------------------------------------------------------
Unlike the first anova, the z here has only one row in the output. The underlying representation within anova is different.
Here it makes sense for predict after anova to ask for predictions when z is .43478259. As far as anova and predict are concerned, the z variable is continuous and can take on any value.
. adjust z if e(sample), by(rep for) se ci ---------------------------------------------------------------------------- Dependent variable: weight Command: anova Covariate set to mean: z = .43478259 ---------------------------------------------------------------------------- ------------------------------------------------ Repair | Record | Car type 1978 | Domestic Foreign ----------+------------------------------------- 1 | 3381.15 | (382.72) | [2615.59,4146.7] | 2 | 3261.84 | (188.801) | [2884.18,3639.49] | 3 | 3373.34 2125.34 | (103.704) (307.021) | [3165.9,3580.78] [1511.21,2739.48] | 4 | 3426.49 2378.39 | (178.887) (183.146) | [3068.66,3784.32] [2012.04,2744.73] | 5 | 2241.15 2463.41 | (382.72) (177.058) | [1475.59,3006.7] [2109.24,2817.58] ------------------------------------------------ Key: Linear Prediction (Standard Error) [95% Confidence Interval]