Jimmy Verner wrote:
Suppose you have an interval dependent variable Y, an interval
independent variable B and a nominal variable C. C has four
categories, C1, C2, C3 and C4. C is coded by four dummy variables, C1
through C4, with the value 1 when "in play" and the value 0 otherwise.
One may regress Y on B and C1 through C4 by dropping the constant:
Model A: reg Y B C1 C2 C3 C4, nocon
Alternatively, one may keep the constant but drop a category to avoid
falling into the dummy variable trap. The constant replaces the
dropped category:
Model B: reg Y B C1 C2 C3
If what I have said is correct, why are the p values different for C1
through C3 between the two models? And should not the p value for C4
in Model A be the same as for the constant in Model B?
--------------------------------------------------------------------------------
First question:
Regression coefficients in the first parameterization of the model are the
means of Y for each category adjusted for B. The regression coefficients in
the second parameterization of the model are the differences between means
of Y for categories 1 through 3 and the mean for category 4, all adjusted
for B. (See the coeffients below.) The null hypotheses tested by the first
parameterization are that the adjusted means are equal to zero. Those in
the second are that the adjusted means are equal to that for category 4.
Second question:
Yes--see the results below.
Joseph Coveney
sysuse auto, clear
rename mpg Y
rename weight B
recode rep78 (5=4)
tabulate rep78, generate(C)
regress Y B C1 C2 C3 C4, noconstant
regress Y B C1 C2 C3
Results:
. regress Y B C1 C2 C3 C4, noconstant
Source | SS df MS Number of obs =
69
-------------+------------------------------ F( 5, 64) =
515.31
Model | 32800.265 5 6560.05299 Prob > F =
0.0000
Residual | 814.735035 64 12.7302349 R-squared =
0.9758
-------------+------------------------------ Adj R-squared =
0.9739
Total | 33615 69 487.173913 Root MSE =
3.5679
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
B | -.0057832 .0005962 -9.70
0.000 -.0069744 -.0045921
C1 | 38.92806 3.12755 12.45 0.000 32.68006
45.17606
C2 | 38.52056 2.364302 16.29 0.000 33.79732
43.24379
C3 | 38.51226 2.072076 18.59 0.000 34.37281
42.65171
C4 | 39.22498 1.72017 22.80 0.000 35.78854
42.66141
------------------------------------------------------------------------------
. regress Y B C1 C2 C3
Source | SS df MS Number of obs =
69
-------------+------------------------------ F( 4, 64) =
29.96
Model | 1525.46786 4 381.366966 Prob > F =
0.0000
Residual | 814.735035 64 12.7302349 R-squared =
0.6519
-------------+------------------------------ Adj R-squared =
0.6301
Total | 2340.2029 68 34.4147485 Root MSE =
3.5679
------------------------------------------------------------------------------
Y | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
B | -.0057832 .0005962 -9.70
0.000 -.0069744 -.0045921
C1 | -.296918 2.621481 -0.11 0.910 -5.533929
4.940093
C2 | -.7044196 1.483296 -0.47 0.636 -3.667644
2.258805
C3 | -.7127189 1.003684 -0.71 0.480 -2.717809
1.292371
_cons | 39.22498 1.72017 22.80 0.000 35.78854
42.66141
------------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/