st: bug in -anova-
I think I am noticing a bug in -anova- in Stata 11 in relation to automatic handling of categorical variables. Notice how if I don't tell -anova- a variable is categorical, it misses a level when noconstant is used.
. anova htr pretreat##strain // normal result
Number of obs = 36 R-squared = 0.8933
Root MSE = 7.77827 Adj R-squared = 0.8755
Source | Partial SS df MS F Prob > F
Model | 15189.5106 5 3037.90212 50.21 0.0000
pretreat | 11977.163 2 5988.58148 98.98 0.0000
strain | 1953.15158 1 1953.15158 32.28 0.0000
pretreat#strain | 1259.19604 2 629.598022 10.41 0.0004
Residual | 1815.04618 30 60.5015394
Total | 17004.5568 35 485.844479
. egen c = group(pretreat strain) // make variable for cell means model
. table pretreat strain, c(mean c) // show cell means grouping variable
| strain
pretreat | C57BL/6J DBA/2J
SB206553 | 1 2
SB242084 | 3 4
Saline | 5 6
. anova htr ibn.c, noconstant // run cell means model ***CORRECT with 6 df***
Number of obs = 36 R-squared = 0.9624
Root MSE = 7.77827 Adj R-squared = 0.9548
Source | Partial SS df MS F Prob > F
Model | 46410.4313 6 7735.07188 127.85 0.0000
c | 46410.4313 6 7735.07188 127.85 0.0000
Residual | 1815.04618 30 60.5015394
Total | 48225.4775 36 1339.5966
. anova htr c, noconstant // run cell means model ***WRONG with 5 df***
Number of obs = 36 R-squared = 0.9496
Root MSE = 8.85083 Adj R-squared = 0.9415
Source | Partial SS df MS F Prob > F
Model | 45797.0261 5 9159.40521 116.92 0.0000
c | 45797.0261 5 9159.40521 116.92 0.0000
Residual | 2428.45141 31 78.3371423
Total | 48225.4775 36 1339.5966
. regress
Source | SS df MS Number of obs = 36
-------------+------------------------------ F( 5, 31) = 116.92
Model | 45797.0261 5 9159.40521 Prob > F = 0.0000
Residual | 2428.45141 31 78.3371423 R-squared = 0.9496
-------------+------------------------------ Adj R-squared = 0.9415
Total | 48225.4775 36 1339.5966 Root MSE = 8.8508
htr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
c |
2 | 14.16667 3.613335 3.92 0.000 6.797221 21.53611
3 | 17.08333 3.613335 4.73 0.000 9.713888 24.45278
4 | 26 3.613335 7.20 0.000 18.63055 33.36945
5 | 39.05555 3.613335 10.81 0.000 31.6861 46.425
6 | 70.27778 3.613335 19.45 0.000 62.90834 77.64723
. test, showorder
Order of columns in the design matrix
1: (c==1)
2: (c==2)
3: (c==3)
4: (c==4)
5: (c==5)
6: (c==6)
I know that at <http://www.stata.com/support/faqs/stat/test1.html> the ibn.c notation is used, but this should be fixed if I'm not missing anything.
