Factor variables and value labels | Order |
A factor variable might be
When you fit a model, Stata allows factor-variable notation. You can type
i.attitude
to obtain the levels of factor variable attitude.
i.attitude#c.age
to obtain the levels of attitude interacted with continuous variable age
i.attitude##c.age
to meani.attitude age i.attitude#c.age
i.attitude#i.agegrp
to obtain the levels of attitude interacted with the levels of agegrp
i.attitude##i.agegrp
to meani.attitude i.agegrp i.attitude#i.agegrp
i.attitude#i.agegrp#i.region
to obtain the levels of attitude interacted with the levels of agegrp interacted with the levels of region
i.attitude##i.agegrp##i.region
to meani.attitude i.agegrp i.region i.attitude#i.agegrp i.attitude#i.region i.agegrp#i.region i.attitude#i.agegrp#i.regioni.(attitude agegrp)
to meani.attitude i.agegrp
i.(attitude agegrp)##i.region
to meani.attitude##i.region i.agegrp##i.region
and so on.
Stata also has value labels. You might type
. label define regions 1 "North East" 2 "North Central" 3 "South" 4 "West" . label values region regions
In Stata 13, when you fit a model using factor-variable notation, the labels appear in the output:
. regress y i.attitude i.agegrp i.region
Source | SS df MS | Number of obs = 400 | |
F( 10, 389) = 22.60 | |||
Model | 2668.04079 10 266.804079 | Prob > F = 0.0000 | |
Residual | 4592.44366 389 11.8057678 | R-squared = 0.3675 | |
Adj R-squared = 0.3512 | |||
Total | 7260.48445 399 18.1967029 | Root MSE = 3.436 |
y | Coef. Std. Err. t P>|t| [95% Conf. Interval] | |
attitude | ||
disagree | 1.27901 .5617435 2.28 0.023 .1745764 2.383443 | |
neutral | 1.466543 .5304032 2.76 0.006 .4237268 2.509358 | |
agree | 2.063136 .5326997 3.87 0.000 1.015805 3.110467 | |
strongly agree | 3.550927 .5801312 6.12 0.000 2.410343 4.691512 | |
agegrp | ||
31-40 | 2.114168 .4868806 4.34 0.000 1.156921 3.071414 | |
41-50 | 3.970627 .4866537 8.16 0.000 3.013826 4.927428 | |
50+ | 5.990408 .4869362 12.30 0.000 5.033052 6.947764 | |
region | ||
North Central | .673176 .4913976 1.37 0.172 -.2929515 1.639304 | |
South | -1.366099 .491862 -2.78 0.006 -2.33314 -.3990588 | |
West | -1.477714 .4890703 -3.02 0.003 -2.439266 -.5161623 | |
_cons | 8.411983 .5760115 14.60 0.000 7.279498 9.544468 | |
Value labels are also used by Stata's postestimation commands. Below we use pwcompare to compare y values for each pairing of the age groups:
. pwcompare agegrp Pairwise comparisons of marginal linear predictions Margins : asbalanced
Unadjusted | ||
Contrast Std. Err. [95% Conf. Interval] | ||
agegrp | ||
31-40 vs 20-30 | 2.114168 .4868806 1.156921 3.071414 | |
41-50 vs 20-30 | 3.970627 .4866537 3.013826 4.927428 | |
50+ vs 20-30 | 5.990408 .4869362 5.033052 6.947764 | |
41-50 vs 31-40 | 1.856459 .4869484 .8990793 2.813839 | |
50+ vs 31-40 | 3.87624 .4870898 2.918582 4.833898 | |
50+ vs 41-50 | 2.019781 .4878207 1.060686 2.978876 | |
For instance, 31–40 year olds, have an average value of y that is 2.11 higher than that of 20–30 year olds, controlling for the other covariates in the model.
To learn more about factor variables, see the manual entry.
To learn more about pwcompare, see its manual entry.
See New in Stata 18 to learn about what was added in Stata 18.