Principal components
Stata's pca command allows you to estimate parameters of principal-component models.
. webuse auto
(1978 Automobile Data)
. pca price mpg rep78 headroom weight length displacement foreign
Principal components/correlation Number of obs = 69
Number of comp. = 8
Trace = 8
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 4.7823 3.51481 0.5978 0.5978
Comp2 | 1.2675 .429638 0.1584 0.7562
Comp3 | .837857 .398188 0.1047 0.8610
Comp4 | .439668 .0670301 0.0550 0.9159
Comp5 | .372638 .210794 0.0466 0.9625
Comp6 | .161844 .0521133 0.0202 0.9827
Comp7 | .109731 .081265 0.0137 0.9964
Comp8 | .0284659 . 0.0036 1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
--------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6
-------------+------------------------------------------------------------
price | 0.2324 0.6397 -0.3334 -0.2099 0.4974 -0.2815
mpg | -0.3897 -0.1065 0.0824 0.2568 0.6975 0.5011
rep78 | -0.2368 0.5697 0.3960 0.6256 -0.1650 -0.1928
headroom | 0.2560 -0.0315 0.8439 -0.3750 0.2560 -0.1184
weight | 0.4435 0.0979 -0.0325 0.1792 -0.0296 0.2657
length | 0.4298 0.0687 0.0864 0.1845 -0.2438 0.4144
displacement | 0.4304 0.0851 -0.0445 0.1524 0.1782 0.2907
foreign | -0.3254 0.4820 0.0498 -0.5183 -0.2850 0.5401
--------------------------------------------------------------------------
------------------------------------------------
Variable | Comp7 Comp8 | Unexplained
-------------+--------------------+-------------
price | 0.2165 -0.0891 | 0
mpg | 0.1625 0.0115 | 0
rep78 | -0.0813 0.0065 | 0
headroom | 0.0226 0.0252 | 0
weight | 0.1104 0.8228 | 0
length | 0.5437 -0.4921 | 0
displacement | -0.7733 -0.2608 | 0
foreign | -0.1173 0.0639 | 0
------------------------------------------------
We typed pca price mpg ... foreign. All Stata commands share
the same syntax: the names of the variables (dependent first and then
independent) follow the command's name, and they are, optionally, followed by
a comma and any options. In this case, we did not specify any options.
Having estimated the principal components, we can at any time type pca
by itself to redisplay the principal-component output. We can also type
screeplot to obtain a scree plot of the eigenvalues, and we can use the
predict command to obtain the components themselves.
screeplot, typed by itself, graphs the proportion of variance explained
by each component:
. screeplot

Typing screeplot, yline(1) ci(het) adds a line across the y-axis at 1
and adds heteroskedastic bootstrap confidence intervals.
. screeplot, yline(1) ci(het)
We can obtain the first two components by typing
. predict pc1 pc2, score
(6 components skipped)
Scoring coefficients
sum-of-squares(column-loading) = 1
--------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6
-------------+------------------------------------------------------------
price | 0.2324 0.6397 -0.3334 -0.2099 0.4974 -0.2815
mpg | -0.3897 -0.1065 0.0824 0.2568 0.6975 0.5011
rep78 | -0.2368 0.5697 0.3960 0.6256 -0.1650 -0.1928
headroom | 0.2560 -0.0315 0.8439 -0.3750 0.2560 -0.1184
weight | 0.4435 0.0979 -0.0325 0.1792 -0.0296 0.2657
length | 0.4298 0.0687 0.0864 0.1845 -0.2438 0.4144
displacement | 0.4304 0.0851 -0.0445 0.1524 0.1782 0.2907
foreign | -0.3254 0.4820 0.0498 -0.5183 -0.2850 0.5401
--------------------------------------------------------------------------
----------------------------------
Variable | Comp7 Comp8
-------------+--------------------
price | 0.2165 -0.0891
mpg | 0.1625 0.0115
rep78 | -0.0813 0.0065
headroom | 0.0226 0.0252
weight | 0.1104 0.8228
length | 0.5437 -0.4921
displacement | -0.7733 -0.2608
foreign | -0.1173 0.0639
----------------------------------
The score option tells Stata's predict command to compute the
scores of the components, and pc1 and pc2 are the names we have
chosen for the two new variables. We could have obtained the first three
factors by typing, for example, predict pc1 pc2 pc3, score.
An important feature of Stata is that it does not have modes or modules. We
typed pca to estimate the principal components. We then typed
screeplot to see a graph of the eigenvalues — we did not have to
save the data and change modules. Similarly, we typed predict pc1 pc2,
score to obtain the first two components. The new variables, pc1
and pc2, are now part of our data and are ready for use; we could now
use regress to fit a regression model.
The two components should have correlation 0, and we can use the
correlate command, which like every other Stata command, is always
available for use. To verify that the correlation between pc1 and
pc2 is zero, we type
. correlate pc1 pc2
(obs=69)
| pc1 pc2
-------------+------------------
pc1 | 1.0000
pc2 | 0.0000 1.0000
See
New in Stata 10
for more about what was added in Stata Release 10.
|