How do I keep all levels of my categorical variable in my model?
How do I specify a cell means model?
Title
Keeping all levels of a variable in the model
Author
Kenneth Higbee, StataCorp
In the following example, we use
regress as
our estimation command, but the same thing applies to other estimation
commands that have a noconstant option.
Coefficient Std. err. t P>|t| [95% conf. interval]
rep78
2
19.125 2.267151 8.44 0.000 14.59719 23.65281
3
19.43333 1.170752 16.60 0.000 17.09518 21.77149
4
21.66667 1.511434 14.34 0.000 18.64812 24.68521
5
27.36364 1.933433 14.15 0.000 23.5023 31.22497
and then wonder why the first level of rep78 does not appear in your
regression table. If you add the baselevels option to your regression
command, you will see that the first level is considered a base level and has
been omitted from the model.
. regress mpg i.rep78, noconstant baselevels
Source
SS df MS
Number of obs = 69
F(4, 65) = 188.12
Model
30942.2129 4 7735.55322
Prob > F = 0.0000
Residual
2672.78712 65 41.1198019
R-squared = 0.9205
Adj R-squared = 0.9156
Total
33615 69 487.173913
Root MSE = 6.4125
mpg
Coefficient Std. err. t P>|t| [95% conf. interval]
rep78
1
0 (base)
2
19.125 2.267151 8.44 0.000 14.59719 23.65281
3
19.43333 1.170752 16.60 0.000 17.09518 21.77149
4
21.66667 1.511434 14.34 0.000 18.64812 24.68521
5
27.36364 1.933433 14.15 0.000 23.5023 31.22497
The ibn. factor-variable operator specifies that a categorical variable
should be treated as if it has no base, or, in other words, that all levels of
the categorical variable are to be included in the model; see
[U] 11.4.3 Factor variables.
What happens when you specify that rep78 should have no base level but
leave the constant in the model?
. regress mpg ibn.rep78
note: 5.rep78 omitted because of collinearity
Source
SS df MS
Number of obs = 69
F(4, 64) = 4.91
Model
549.415777 4 137.353944
Prob > F = 0.0016
Residual
1790.78712 64 27.9810488
R-squared = 0.2348
Adj R-squared = 0.1869
Total
2340.2029 68 34.4147485
Root MSE = 5.2897
mpg
Coefficient Std. err. t P>|t| [95% conf. interval]
rep78
1
-6.363636 4.066234 -1.56 0.123 -14.48687 1.759599
2
-8.238636 2.457918 -3.35 0.001 -13.14889 -3.32838
3
-7.930303 1.86452 -4.25 0.000 -11.65511 -4.205497
4
-5.69697 2.02441 -2.81 0.006 -9.741193 -1.652747
5
0 (omitted)
_cons
27.36364 1.594908 17.16 0.000 24.17744 30.54983
One of the levels of rep78 is omitted from the model despite your
request that there be no base level for rep78. If you have the
constant and all levels of a categorical variable in a model, something must
be dropped because of the collinearity between all the levels and the
constant.
You need to use the ibn. operator on your categorical variable and the
noconstant option on your estimation command to obtain a cell means
model.
. regress mpg ibn.rep78, noconstant
Source
SS df MS
Number of obs = 69
F(5, 64) = 227.47
Model
31824.2129 5 6364.84258
Prob > F = 0.0000
Residual
1790.78712 64 27.9810488
R-squared = 0.9467
Adj R-squared = 0.9426
Total
33615 69 487.173913
Root MSE = 5.2897
mpg
Coefficient Std. err. t P>|t| [95% conf. interval]
rep78
1
21 3.740391 5.61 0.000 13.52771 28.47229
2
19.125 1.870195 10.23 0.000 15.38886 22.86114
3
19.43333 .9657648 20.12 0.000 17.504 21.36267
4
21.66667 1.246797 17.38 0.000 19.1759 24.15743
5
27.36364 1.594908 17.16 0.000 24.17744 30.54983
×
We use cookies
We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.
Cookie Settings
Privacy policy
Last updated: 16 November 2022
StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.
These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies
This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.
Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.