Stata 11 introduced the margins command, which superseded adjust.
| Title | Producing adjusted means after ANOVA | |
| Author | Kenneth Higbee, StataCorp |
Someone posed the following question:
I am running some simple ANOVAs and wanted also to produce the adjusted means. The command is a 3-way ANOVA with a single 2-way interaction. All the predictors are dichotomous (0/1) variables. There were a few problems with the output.
First, when I run
. anova opportu2 volsex frcsex volsex*frcsex q3
and then
. adjust q3 if e(sample), by(volsex frcsex) se ci
I get a table with all cells missing.
I then decided to run
. anova opportu2 volsex frcsex volsex*frcsex q3, cont(q3) . adjust q3 if e(sample), by(volsex frcsex) se ci
This did work! What is going on?
The result you show, comparing when q3 was used as a categorical variable and when it was specified to be a continuous variable in the ANOVA, does not surprise me. Let me explain why using the auto data.
. sysuse auto
(1978 Automobile Data)
. gen z = trunk < 14
. anova wei rep for rep*for z
Number of obs = 69 R-squared = 0.6079
Root MSE = 528.54 Adj R-squared = 0.5556
Source | Partial SS df MS F Prob > F
--------------+----------------------------------------------------
Model | 25984464.5 8 3248058.07 11.63 0.0000
|
rep78 | 1524294.09 4 381073.522 1.36 0.2571
foreign | 3521325.49 1 3521325.49 12.61 0.0008
rep78*foreign | 2300624.62 2 1150312.31 4.12 0.0211
z | 3248513.88 1 3248513.88 11.63 0.0012
|
Residual | 16761251.4 60 279354.19
--------------+----------------------------------------------------
Total | 42745715.9 68 628613.47
I used the 0/1 variable z as a categorical variable in the anova above.
Now, just like you experienced, when I use adjust to adjust to the MEAN of z, I get nothing useful.
. adjust z if e(sample), by(for rep) se ci
----------------------------------------------------------------------------
Dependent variable: weight Command: anova
Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
(8 missing values generated)
(8 missing values generated)
----------------------------------------
| Repair Record 1978
Car type | 1 2 3 4 5
----------+-----------------------------
Domestic |
|
|
|
Foreign |
|
|
----------------------------------------
Key: Linear Prediction
(Standard Error)
[95% Confidence Interval]
Notice the messages about missing values generated.
Think about the parameterization used by ANOVA models. Categorical variables enter the design matrix as a set of indicator (also called dummy) variables. For instance, rep78, which has five levels, becomes 5 columns in the ANOVA design matrix (one of these levels will later be dropped in the estimation process since there are only 4 degrees of freedom for the 5 levels. foreign becomes 2 columns in the design matrix, and one of them will be dropped later in the estimation process. The same is true for the z variable.
Here is a look at the underlying regression for the ANOVA above:
. regress
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 8, 60) = 11.63
Model | 25984464.5 8 3248058.07 Prob > F = 0.0000
Residual | 16761251.4 60 279354.19 R-squared = 0.6079
-------------+------------------------------ Adj R-squared = 0.5556
Total | 42745715.9 68 628613.47 Root MSE = 528.54
------------------------------------------------------------------------------
weight Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 2182.261 187.7289 11.62 0.000 1806.748 2557.775
rep78
1 1140 528.5397 2.16 0.035 82.76324 2197.237
2 1020.691 431.9311 2.36 0.021 156.7003 1884.682
3 -338.0654 352.7323 -0.96 0.342 -1043.635 367.5043
4 -85.01959 251.2557 -0.34 0.736 -587.6057 417.5666
5 (dropped)
foreign
1 -222.2614 418.2335 -0.53 0.597 -1058.853 614.3301
2 (dropped)
z
1 497.4118 145.8651 3.41 0.001 205.6382 789.1855
2 (dropped)
rep78*foreign
1 1 (dropped)
2 1 (dropped)
3 1 1470.257 536.9423 2.74 0.008 396.2125 2544.301
3 2 (dropped)
4 1 1270.366 504.0553 2.52 0.014 262.1051 2278.627
4 2 (dropped)
5 1 (dropped)
5 2 (dropped)
------------------------------------------------------------------------------
predict produces missing values when asked to produce predictions for these 10 points. It does this because since z entered the ANOVA model as a categorical variable with 0 and 1 as the valid values of z, having z = .43478259 doesn’t correspond to either 0 or 1.
predict would, for instance, also produce a missing value if you asked for a prediction when rep78 = 3.257, rep78 = 12, etc. After anova, the only valid values for categorical variables for predict are those values present in the ANOVA.
Now watch what happens when I do the following adjust:
. adjust z=0 if e(sample), by(rep for) se ci
----------------------------------------------------------------------------
Dependent variable: weight Command: anova
Covariate set to value: z = 0
----------------------------------------------------------------------------
------------------------------------------------
Repair |
Record | Car type
1978 | Domestic Foreign
----------+-------------------------------------
1 | 3597.41
| (401.19)
| [2794.91,4399.91]
|
2 | 3478.1
| (190.392)
| [3097.26,3858.94]
|
3 | 3589.6 2341.61
| (110.519) (320.272)
| [3368.53,3810.67] [1700.97,2982.25]
|
4 | 3642.76 2594.65
| (179.137) (209.548)
| [3284.43,4001.09] [2175.5,3013.81]
|
5 | 2457.41 2679.67
| (401.19) (193.923)
| [1654.91,3259.91] [2291.77,3067.58]
------------------------------------------------
Key: Linear Prediction
(Standard Error)
[95% Confidence Interval]
The answers I got are the adjusted predictions when z is 0. I could also get predictions when z is 1 with
. adjust z=1 if e(sample), by(rep for) se ci
output omitted
If I ask for predictions at any other values of z besides 0 or 1, I will get missing values from the predictions.
Now, instead of having z enter the anova model as a categorical variable, you instead send it in as a continuous variable (a covariate in an ANCOVA).
. anova wei rep for rep*for z, cont(z)
Number of obs = 69 R-squared = 0.6079
Root MSE = 528.54 Adj R-squared = 0.5556
Source | Partial SS df MS F Prob > F
--------------+----------------------------------------------------
Model | 25984464.5 8 3248058.07 11.63 0.0000
|
rep78 | 1524294.09 4 381073.522 1.36 0.2571
foreign | 3521325.49 1 3521325.49 12.61 0.0008
rep78*foreign | 2300624.62 2 1150312.31 4.12 0.0211
z | 3248513.88 1 3248513.88 11.63 0.0012
|
Residual | 16761251.4 60 279354.19
--------------+----------------------------------------------------
Total | 42745715.9 68 628613.47
The ANOVA table looks the same, but the underlying representation is different. There is one less column in the design matrix. z only has one column instead of two (corresponding to z=0 and z=1). Instead, since z is “continuous”, the ANOVA is happy with whatever values might happen to be in z. Since z had only two levels (0 and 1), the resulting ANOVA table is identical. This would not be true if z had 3 or more levels. Then the first anova would have had more degrees of freedom for the z, while the second anova would continue to have only 1 degree of freedom.
Here is the underlying regression for the ANOVA above:
. regress
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 8, 60) = 11.63
Model | 25984464.5 8 3248058.07 Prob > F = 0.0000
Residual | 16761251.4 60 279354.19 R-squared = 0.6079
-------------+------------------------------ Adj R-squared = 0.5556
Total | 42745715.9 68 628613.47 Root MSE = 528.54
------------------------------------------------------------------------------
weight Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
_cons 2679.673 193.9232 13.82 0.000 2291.769 3067.577
rep78
1 1140 528.5397 2.16 0.035 82.76324 2197.237
2 1020.691 431.9311 2.36 0.021 156.7003 1884.682
3 -338.0654 352.7323 -0.96 0.342 -1043.635 367.5043
4 -85.01959 251.2557 -0.34 0.736 -587.6057 417.5666
5 (dropped)
foreign
1 -222.2614 418.2335 -0.53 0.597 -1058.853 614.3301
2 (dropped)
z -497.4118 145.8651 -3.41 0.001 -789.1855 -205.6382
rep78*foreign
1 1 (dropped)
2 1 (dropped)
3 1 1470.257 536.9423 2.74 0.008 396.2125 2544.301
3 2 (dropped)
4 1 1270.366 504.0553 2.52 0.014 262.1051 2278.627
4 2 (dropped)
5 1 (dropped)
5 2 (dropped)
------------------------------------------------------------------------------
Unlike the first anova, the z here has only one row in the output. The underlying representation within anova is different.
Here it makes sense for predict after anova to ask for predictions when z is .43478259. As far as anova and predict are concerned, the z variable is continuous and can take on any value.
. adjust z if e(sample), by(rep for) se ci
----------------------------------------------------------------------------
Dependent variable: weight Command: anova
Covariate set to mean: z = .43478259
----------------------------------------------------------------------------
------------------------------------------------
Repair |
Record | Car type
1978 | Domestic Foreign
----------+-------------------------------------
1 | 3381.15
| (382.72)
| [2615.59,4146.7]
|
2 | 3261.84
| (188.801)
| [2884.18,3639.49]
|
3 | 3373.34 2125.34
| (103.704) (307.021)
| [3165.9,3580.78] [1511.21,2739.48]
|
4 | 3426.49 2378.39
| (178.887) (183.146)
| [3068.66,3784.32] [2012.04,2744.73]
|
5 | 2241.15 2463.41
| (382.72) (177.058)
| [1475.59,3006.7] [2109.24,2817.58]
------------------------------------------------
Key: Linear Prediction
(Standard Error)
[95% Confidence Interval]