Factor-variable notation is a collection of prefixes and operators that allows us to specify regression models quickly and easily. We can distinguish between continuous and categorical variables, select reference categories, specify interactions between variables, and include polynomials of continuous variables. And factor-variable notation works with nearly all of Stata's regression commands such as regress, probit, logit, and poisson.
Let's begin by opening the nhanes2l dataset. Then let's describe and summarize the variables bpsystol, age, bmi, diabetes, and hlthstat.
. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)
. describe bpsystol hlthstat diabetes age bmi
Variable Storage Display Value
name type format label Variable label |
|
bpsystol int %9.0g Systolic blood pressure
hlthstat byte %20.0g hlth Health status
diabetes byte %12.0g diabetes Diabetes status
age byte %9.0g Age (years)
bmi float %9.0g Body mass index (BMI) |
.
summarize bpsystol hlthstat diabetes age bmi
Variable | | Obs Mean Std. dev. Min Max |
| | |
bpsystol | | 10,351 130.8817 23.33265 65 300 |
hlthstat | | 10,335 2.586164 1.206196 1 5 |
diabetes | | 10,349 .0482172 .2142353 0 1 |
age | | 10,351 47.57965 17.21483 20 74 |
bmi | | 10,351 25.5376 4.914969 12.3856 61.1297 |
We are going to fit a series of linear regression models for the outcome variable bpsystol, which measures systolic blood pressure with a range of 65 to 300 mmHg. hlthstat measures health status with a range from 1 to 5. diabetes measures diabetes status with a range of 0 to 1. age measures age with a range of 20 to 74 years. And bmi measures body mass index with a range of 12.4 to 61.1 kg/m2.
Factor-variable notation for categorical variables
Let's begin with a model including the predictor variable hlthstat. We suspect that hlthstat is a categorical variable because its description shows a value label named “hlth” and its summary has a minimum value of 1 and a maximum value of 5. Let's use label list to view the category labels.
. label list hlth
hlth:
1 Excellent
2 Very good
3 Good
4 Fair
5 Poor
.a Blank but applicable
hlthstat has five categories labeled Excellent, Very good, Good, Fair, and Poor. Stata's regression commands treat predictor variables as continuous by default, so we need to create indicator variables for each category of hlthstat. We could do this manually, but it is easier to use the “i.” prefix. The “i.” prefix is factor-variable notation that tells Stata a variable is categorical, and Stata will create temporary indicator variables for us automatically. Let's type list hlthstat i.hlthstat to see how it works.
. list hlthstat i.hlthstat in 1/10
| 1. 2. 3. 4. 5. |
| hlthstat hlthstat hlthstat hlthstat hlthstat hlthstat |
1. | Very good 0 1 0 0 0 |
2. | Very good 0 1 0 0 0 |
3. | Good 0 0 1 0 0 |
4. | Fair 0 0 0 1 0 |
5. | Very good 0 1 0 0 0 |
6. | Poor 0 0 0 0 1 |
7. | Very good 0 1 0 0 0 |
8. | Excellent 1 0 0 0 0 |
9. | Very good 0 1 0 0 0 |
10. | Poor 0 0 0 0 1 |
The first column lists the value of hlthstat for the first 10 observations in our dataset. The next five columns, named 1.hlthstat through 5.hlthstat, are temporary indicator variables that Stata created for us. Category 1 in hlthstat is labeled “Excellent”, so the indicator variable 1.hlthstat will equal 1 when hlthstat equals “Excellent” and 0 otherwise. Category 2 in hlthstat is labeled “Very good”, so the indicator variable 2.hlthstat will equal 1 when hlthstat equals “Very good” and 0 otherwise. The indicator variables 3.hlthstat, 4.hlthstat, and 5.hlthstat follow the same pattern for “Good”, “Fair”, and “Poor”, respectively. Note that the indicator variables do not remain in the dataset after the command finishes running.
We can use the “i.” prefix with regress to treat hlthstat as a categorical predictor variable.
. regress bpsystol i.hlthstat
Source | | SS df MS | Number of obs = 10,335 |
| | | F(4, 10330) = 158.34 |
Model | | 325244.686 4 81311.1715 | Prob > F = 0.0000 |
Residual | | 5304728.67 10,330 513.526492 | R-squared = 0.0578 |
| | | Adj R-squared = 0.0574 |
Total | | 5629973.35 10,334 544.800982 | Root MSE = 22.661 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Very good | | 2.981587 .6415165 4.65 0.000 1.72409 4.239083 |
Good | | 8.034913 .6230047 12.90 0.000 6.813703 9.256123 |
Fair | | 14.71925 .721698 20.40 0.000 13.30459 16.13392 |
Poor | | 16.42304 .9580047 17.14 0.000 14.54517 18.30092 |
| | |
_cons | | 124.3191 .4618951 269.15 0.000 123.4137 125.2245 |
|
The output includes a coefficient for the intercept, labeled “_cons”, as well as slope coefficients for “Very good”, “Good”, “Fair”, and “Poor”. The “Excellent” category was automatically removed from the model and used as the comparison group called the “reference category”. By default, Stata will select the category with the smallest number, estimate the mean of the outcome for that category, and label it “_cons”. So the mean systolic blood pressure for the “Excellent” category is 124.3 mmHg. The coefficients for the other categories are the differences between the mean outcome in that category relative to the reference category. For example, the coefficient for the “Poor” group is 16.4, so the mean systolic blood pressure in the “Poor” group is 16.4 points higher than the “Excellent” category.
We can select a different reference category using the “ib(#).” prefix, where “#” is the category number for the reference category. Let's use hlthstat category 5, “Poor”, as the reference category.
. regress bpsystol ib(5).hlthstat
Source | | SS df MS | Number of obs = 10,335 |
| | | F(4, 10330) = 158.34 |
Model | | 325244.686 4 81311.1715 | Prob > F = 0.0000 |
Residual | | 5304728.67 10,330 513.526492 | R-squared = 0.0578 |
| | | Adj R-squared = 0.0574 |
Total | | 5629973.35 10,334 544.800982 | Root MSE = 22.661 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Excellent | | -16.42304 .9580047 -17.14 0.000 -18.30092 -14.54517 |
Very good | | -13.44146 .9500643 -14.15 0.000 -15.30377 -11.57915 |
Good | | -8.38813 .937664 -8.95 0.000 -10.22613 -6.550127 |
Fair | | -1.703789 1.005946 -1.69 0.090 -3.675638 .2680593 |
| | |
_cons | | 140.7421 .8393008 167.69 0.000 139.0969 142.3873 |
|
The “Poor” category is now omitted from the output and “Excellent” is included. The coefficient for _cons, 140.7, is now the mean systolic blood pressure in the “Poor” group, and the mean systolic blood pressure in the “Excellent” group is 16.4 mmHg lower than the “Poor” group.
We can also use the prefix “ib(frequent).” to select the category with the largest sample size. We can type tabulate hlthstat to verify that the “Good” category has the largest sample size.
. tabulate hlthstat
Health status | | Freq. Percent Cum. |
| | | |
Excellent | | 2,407 23.29 23.29 |
Very good | | 2,591 25.07 48.36 |
Good | | 2,938 28.43 76.79 |
Fair | | 1,670 16.16 92.95 |
Poor | | 729 7.05 100.00 |
| | | |
Total | | 10,335 100.00 |
.
regress bpsystol ib(frequent).hlthstat
Source | | SS df MS | Number of obs = 10,335 |
| | | F(4, 10330) = 158.34 |
Model | | 325244.686 4 81311.1715 | Prob > F = 0.0000 |
Residual | | 5304728.67 10,330 513.526492 | R-squared = 0.0578 |
| | | Adj R-squared = 0.0574 |
Total | | 5629973.35 10,334 544.800982 | Root MSE = 22.661 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Excellent | | -8.034913 .6230047 -12.90 0.000 -9.256123 -6.813703 |
Very good | | -5.053326 .6107242 -8.27 0.000 -6.250464 -3.856189 |
Fair | | 6.684341 .6944701 9.63 0.000 5.323045 8.045637 |
Poor | | 8.38813 .937664 8.95 0.000 6.550127 10.22613 |
| | |
_cons | | 132.354 .4180763 316.58 0.000 131.5345 133.1735 |
|
We can also use the prefix “ib(none).” to omit the reference category. This will display the mean outcome for each category when combined with the noconstant option.
. regress bpsystol ib(none).hlthstat, noconstant
Source | | SS df MS | Number of obs = 10,335 |
| | | F(5, 10330) = 69083.04 |
Model | | 177379866 5 35475973.3 | Prob > F = 0.0000 |
Residual | | 5304728.67 10,330 513.526492 | R-squared = 0.9710 |
| | | Adj R-squared = 0.9709 |
Total | | 182684595 10,335 17676.3033 | Root MSE = 22.661 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Excellent | | 124.3191 .4618951 269.15 0.000 123.4137 125.2245 |
Very good | | 127.3007 .4451924 285.95 0.000 126.428 128.1733 |
Good | | 132.354 .4180763 316.58 0.000 131.5345 133.1735 |
Fair | | 139.0383 .5545276 250.73 0.000 137.9513 140.1253 |
Poor | | 140.7421 .8393008 167.69 0.000 139.0969 142.3873 |
|
The output tells us that the mean systolic blood pressure in the “Excellent” category is 124.3 and the mean systolic blood pressure in the “Poor” group is 140.7.
Factor-variable notation for binary variables
Binary variables are simply categorical variables with two categories, so everything we discussed above applies to binary variables. Binary variables are often coded as “0/1” indicator variables, but you should still use the “i.” prefix if you plan to use postestimation commands, such as margins, after you fit a regression model. Let's look at a few quick examples in the interest of completeness.
Here is a model that includes diabetes as a binary predictor variable.
. regress bpsystol i.diabetes
Source | | SS df MS | Number of obs = 10,349 |
| | | F(1, 10347) = 244.99 |
Model | | 130296.034 1 130296.034 | Prob > F = 0.0000 |
Residual | | 5502984.01 10,347 531.843434 | R-squared = 0.0231 |
| | | Adj R-squared = 0.0230 |
Total | | 5633280.05 10,348 544.38346 | Root MSE = 23.062 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
diabetes | | |
Diabetic | | 16.56328 1.058212 15.65 0.000 14.48898 18.63758 |
_cons | | 130.088 .2323666 559.84 0.000 129.6325 130.5435 |
|
Let's use factor-variable notation to select people with diabetes as the reference category.
. regress bpsystol ib(1).diabetes
Source | | SS df MS | Number of obs = 10,349 |
| | | F(1, 10347) = 244.99 |
Model | | 130296.034 1 130296.034 | Prob > F = 0.0000 |
Residual | | 5502984.01 10,347 531.843434 | R-squared = 0.0231 |
| | | Adj R-squared = 0.0230 |
Total | | 5633280.05 10,348 544.38346 | Root MSE = 23.062 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
diabetes | | |
Not diabetic | | -16.56328 1.058212 -15.65 0.000 -18.63758 -14.48898 |
_cons | | 146.6513 1.032385 142.05 0.000 144.6276 148.675 |
|
Let's fit a model with no intercept and no reference category.
. regress bpsystol ib(none).diabetes, noconstant
Source | | SS df MS | Number of obs = 10,349 |
| | | F(2, 10347) > 99999.00 |
Model | | 177422292 2 88711146 | Prob > F = 0.0000 |
Residual | | 5502984.01 10,347 531.843434 | R-squared = 0.9699 |
| | | Adj R-squared = 0.9699 |
Total | | 182925276 10,349 17675.6475 | Root MSE = 23.062 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
diabetes | | |
Not diabetic | | 130.088 .2323666 559.84 0.000 129.6325 130.5435 |
Diabetic | | 146.6513 1.032385 142.05 0.000 144.6276 148.675 |
|
Factor-variable notation for continuous variables
Stata's regression commands treat predictor variables as continuous by default. But you can use the “c.” prefix to tell Stata explicitly that a predictor variable should be treated as continuous. This will be necessary when you include continuous variables in interactions with other variables.
Here is a quick example treating age as a continuous predictor variable.
. regress bpsystol c.age
Source | | SS df MS | Number of obs = 10,351 |
| | | F(1, 10349) = 3116.79 |
Model | | 1304200.02 1 1304200.02 | Prob > F = 0.0000 |
Residual | | 4330470.01 10,349 418.443328 | R-squared = 0.2315 |
| | | Adj R-squared = 0.2314 |
Total | | 5634670.03 10,350 544.412563 | Root MSE = 20.456 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
age | | .6520775 .0116801 55.83 0.000 .6291823 .6749727 |
_cons | | 99.85603 .5909867 168.96 0.000 98.69758 101.0145 |
|
Factor-variable notation for interactions
Factor-variable notation also includes two operators. The “#” operator specifies an interaction between two variables, and the “##” operator specifies both the main effects and interaction of two variables.
Let's fit a model that includes the main effects for hlthstat and diabetes and use the “#” operator to include their interaction.
. regress bpsystol i.hlthstat i.diabetes i.hlthstat#i.diabetes
Source | | SS df MS | Number of obs = 10,335 |
| | | F(9, 10325) = 86.92 |
Model | | 396524.045 9 44058.2272 | Prob > F = 0.0000 |
Residual | | 5233449.31 10,325 506.871604 | R-squared = 0.0704 |
| | | Adj R-squared = 0.0696 |
Total | | 5629973.35 10,334 544.800982 | Root MSE = 22.514 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Very good | | 2.636051 .6417076 4.11 0.000 1.37818 3.893922 |
Good | | 7.648725 .6272209 12.19 0.000 6.419251 8.8782 |
Fair | | 13.50647 .7408272 18.23 0.000 12.0543 14.95863 |
Poor | | 14.77223 1.032484 14.31 0.000 12.74837 16.7961 |
| | |
diabetes | | |
Diabetic | | 5.780232 4.618696 1.25 0.211 -3.273308 14.83377 |
| | |
hlthstat# | | |
diabetes | | |
Very good # | | |
Diabetic | | 17.43339 5.726714 3.04 0.002 6.207924 28.65886 |
Good # | | |
Diabetic | | 4.023894 5.032308 0.80 0.424 -5.840404 13.88819 |
Fair # | | |
Diabetic | | 7.316062 4.97969 1.47 0.142 -2.445096 17.07722 |
Poor # | | |
Diabetic | | 3.445358 5.09316 0.68 0.499 -6.538222 13.42894 |
| | |
_cons | | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 |
|
We could fit the same model using the “##” operator.
. regress bpsystol i.hlthstat##i.diabetes
Source | | SS df MS | Number of obs = 10,335 |
| | | F(9, 10325) = 86.92 |
Model | | 396524.045 9 44058.2272 | Prob > F = 0.0000 |
Residual | | 5233449.31 10,325 506.871604 | R-squared = 0.0704 |
| | | Adj R-squared = 0.0696 |
Total | | 5629973.35 10,334 544.800982 | Root MSE = 22.514 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Very good | | 2.636051 .6417076 4.11 0.000 1.37818 3.893922 |
Good | | 7.648725 .6272209 12.19 0.000 6.419251 8.8782 |
Fair | | 13.50647 .7408272 18.23 0.000 12.0543 14.95863 |
Poor | | 14.77223 1.032484 14.31 0.000 12.74837 16.7961 |
| | |
diabetes | | |
Diabetic | | 5.780232 4.618696 1.25 0.211 -3.273308 14.83377 |
| | |
hlthstat# | | |
diabetes | | |
Very good # | | |
Diabetic | | 17.43339 5.726714 3.04 0.002 6.207924 28.65886 |
Good # | | |
Diabetic | | 4.023894 5.032308 0.80 0.424 -5.840404 13.88819 |
Fair # | | |
Diabetic | | 7.316062 4.97969 1.47 0.142 -2.445096 17.07722 |
Poor # | | |
Diabetic | | 3.445358 5.09316 0.68 0.499 -6.538222 13.42894 |
| | |
_cons | | 124.2614 .4611975 269.43 0.000 123.3574 125.1655 |
|
We can include interactions with continuous variables too.
. regress bpsystol i.diabetes##c.age
Source | | SS df MS | Number of obs = 10,349 |
| | | F(3, 10345) = 1071.05 |
Model | | 1335031.79 3 445010.595 | Prob > F = 0.0000 |
Residual | | 4298248.26 10,345 415.490407 | R-squared = 0.2370 |
| | | Adj R-squared = 0.2368 |
Total | | 5633280.05 10,348 544.38346 | Root MSE = 20.384 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
diabetes | | |
Diabetic | | -5.669005 4.952369 -1.14 0.252 -15.37661 4.038595 |
age | | .6303981 .0119464 52.77 0.000 .6069808 .6538154 |
| | |
diabetes# | | |
c.age | | |
Diabetic | | .2233087 .0804934 2.77 0.006 .065526 .3810913 |
| | |
_cons | | 100.5111 .5969456 168.38 0.000 99.34096 101.6812 |
|
We can even include three-way and higher-order interactions using the “#” and “##” operators.
. regress bpsystol i.hlthstat##i.diabetes##c.age
Source | | SS df MS | Number of obs = 10,335 |
| | | F(19, 10315) = 173.56 |
Model | | 1363865.23 19 71782.3807 | Prob > F = 0.0000 |
Residual | | 4266108.12 10,315 413.582949 | R-squared = 0.2423 |
| | | Adj R-squared = 0.2409 |
Total | | 5629973.35 10,334 544.800982 | Root MSE = 20.337 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Very good | | -.2522701 1.571793 -0.16 0.872 -3.333289 2.828748 |
Good | | -1.269239 1.640212 -0.77 0.439 -4.484373 1.945895 |
Fair | | -1.892737 2.323042 -0.81 0.415 -6.446351 2.660877 |
Poor | | -1.470403 4.440142 -0.33 0.741 -10.17394 7.233137 |
| | |
diabetes | | |
Diabetic | | 5.648359 16.10149 0.35 0.726 -25.91369 37.21041 |
| | |
hlthstat# | | |
diabetes | | |
Very good # | | |
Diabetic | | .6634293 26.12969 0.03 0.980 -50.55583 51.88269 |
Good # | | |
Diabetic | | -16.56507 18.00713 -0.92 0.358 -51.86255 18.7324 |
Fair # | | |
Diabetic | | -7.761426 18.83079 -0.41 0.680 -44.67343 29.15058 |
Poor # | | |
Diabetic | | -5.055061 20.09251 -0.25 0.801 -44.44028 34.33016 |
| | |
age | | .5505586 .0261998 21.01 0.000 .499202 .6019153 |
| | |
hlthstat# | | |
c.age | | |
Very good | | .026618 .0352546 0.76 0.450 -.0424879 .0957239 |
Good | | .084684 .0349617 2.42 0.015 .0161522 .1532157 |
Fair | | .1210264 .0438944 2.76 0.006 .0349849 .2070679 |
Poor | | .0900039 .0752338 1.20 0.232 -.057469 .2374768 |
| | |
diabetes# | | |
c.age | | |
Diabetic | | -.1428421 .2867743 -0.50 0.618 -.7049754 .4192913 |
| | |
hlthstat# | | |
diabetes# | | |
c.age | | |
Very good # | | |
Diabetic | | .2297988 .4324672 0.53 0.595 -.6179209 1.077518 |
Good # | | |
Diabetic | | .3910658 .316956 1.23 0.217 -.2302295 1.012361 |
Fair # | | |
Diabetic | | .3139083 .3258971 0.96 0.335 -.3249132 .9527298 |
Poor # | | |
Diabetic | | .26957 .3465917 0.78 0.437 -.409817 .948957 |
| | |
_cons | | 102.2407 1.127687 90.66 0.000 100.0302 104.4512 |
|
We have already learned that Stata treats predictor variables as continuous by default. But the opposite is true with interaction operators. Both “#” and “##” treat variables as categorical predictors if you do not specify a prefix. So typing hlthstat##diabetes would work. But typing diabetes##age would make a mess because age would be treated as a categorical variable by default. When in doubt, use the “i.” and “c.” prefixes to avoid mistakes.
The prefixes also have a “distributive property” when used with parentheses. The syntax below treats hlthstat and diabetes as categorical predictors and fits a model that includes their main effects as well as their interactions with age. Note that the model will not include the interaction of hlthstat and diabetes.
. regress bpsystol i.(hlthstat diabetes)##c.age
Source | | SS df MS | Number of obs = 10,335 |
| | | F(11, 10323) = 298.72 |
Model | | 1359359.05 11 123578.096 | Prob > F = 0.0000 |
Residual | | 4270614.3 10,323 413.698954 | R-squared = 0.2415 |
| | | Adj R-squared = 0.2406 |
Total | | 5629973.35 10,334 544.800982 | Root MSE = 20.34 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
hlthstat | | |
Very good | | -.5801787 1.56339 -0.37 0.711 -3.644726 2.484369 |
Good | | -1.453802 1.627043 -0.89 0.372 -4.643121 1.735517 |
Fair | | -2.078403 2.286625 -0.91 0.363 -6.56063 2.403824 |
Poor | | -.9296666 4.211361 -0.22 0.825 -9.18475 7.325417 |
| | |
diabetes | | |
Diabetic | | -5.664698 5.022147 -1.13 0.259 -15.50908 4.179683 |
age | | .5433911 .0259983 20.90 0.000 .4924295 .5943527 |
| | |
hlthstat# | | |
c.age | | |
Very good | | .0382409 .034885 1.10 0.273 -.0301404 .1066222 |
Good | | .0887067 .0345224 2.57 0.010 .021036 .1563773 |
Fair | | .1300174 .0430386 3.02 0.003 .0456535 .2143813 |
Poor | | .0888559 .0713922 1.24 0.213 -.0510867 .2287985 |
| | |
diabetes# | | |
c.age | | |
Diabetic | | .2067666 .0816404 2.53 0.011 .0467356 .3667976 |
| | |
_cons | | 102.4518 1.122841 91.24 0.000 100.2508 104.6528 |
|
Factor-variable notation for polynomials
We can also use the “#” and “##” operators to specify polynomial terms for continuous variables. For example, we may wish to fit a model that includes both age and the square of age in our model. We can do this by interacting age with itself.
. regress bpsystol c.age##c.age
Source | | SS df MS | Number of obs = 10,351 |
| | | F(2, 10348) = 1592.42 |
Model | | 1326071.99 2 663035.995 | Prob > F = 0.0000 |
Residual | | 4308598.04 10,348 416.370123 | R-squared = 0.2353 |
| | | Adj R-squared = 0.2352 |
Total | | 5634670.03 10,350 544.412563 | Root MSE = 20.405 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
age | | .0345687 .0859928 0.40 0.688 -.1339939 .2031312 |
| | |
c.age#c.age | | .0066366 .0009157 7.25 0.000 .0048417 .0084315 |
| | |
_cons | | 112.2463 1.808325 62.07 0.000 108.7017 115.791 |
|
We could include a term for age cubed.
. regress bpsystol c.age##c.age##c.age
Source | | SS df MS | Number of obs = 10,351 |
| | | F(3, 10347) = 1065.37 |
Model | | 1329759.5 3 443253.167 | Prob > F = 0.0000 |
Residual | | 4304910.52 10,347 416.053979 | R-squared = 0.2360 |
| | | Adj R-squared = 0.2358 |
Total | | 5634670.03 10,350 544.412563 | Root MSE = 20.397 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
age | | -1.107037 .3929805 -2.82 0.005 -1.877355 -.3367196 |
| | |
c.age#c.age | | .0329455 .0088844 3.71 0.000 .0155303 .0503607 |
| | |
c.age#c.age# | | |
c.age | | -.0001879 .0000631 -2.98 0.003 -.0003116 -.0000642 |
| | |
_cons | | 112.2463 1.808325 62.07 0.000 108.7017 115.791 |
|
We can also include the square of age when we include an interaction of age with another variable.
. regress bpsystol i.diabetes##c.age c.age#c.age
Source | | SS df MS | Number of obs = 10,349 |
| | | F(4, 10344) = 817.53 |
Model | | 1353111.75 4 338277.939 | Prob > F = 0.0000 |
Residual | | 4280168.29 10,344 413.782704 | R-squared = 0.2402 |
| | | Adj R-squared = 0.2399 |
Total | | 5633280.05 10,348 544.38346 | Root MSE = 20.342 |
|
bpsystol | | Coefficient Std. err. t P>|t| [95% conf. interval] |
| | |
diabetes | | |
Diabetic | | -.8886553 4.994811 -0.18 0.859 -10.67945 8.902141 |
age | | .0640567 .0865028 0.74 0.459 -.1055054 .2336188 |
| | |
diabetes# | | |
c.age | | |
Diabetic | | .1403559 .0813022 1.73 0.084 -.0190122 .2997239 |
| | |
c.age#c.age | | .0061116 .0009246 6.61 0.000 .0042992 .0079239 |
| | |
_cons | | 111.823 1.812009 61.71 0.000 108.2711 115.3749 |
|
You can read more about factor-variable notation in the Stata documentation. You can also watch a demonstration of these commands by clicking on the links to the YouTube videos below.