Home  /  Products  /  Features  /  Multivariate methods  /  Principal components

Stata’s pca allows you to estimate parameters of principal-component models.

. webuse auto
(1978 Automobile Data)

. pca price mpg rep78 headroom weight length displacement

Principal components/correlation            Number of obs    =        69
                                            Number of comp.  =         7
                                            Trace            =         7
    Rotation: (unrotated = principal)       Rho              =    1.0000
    
Component Eigenvalue Difference Proportion Cumulative
Comp1 4.34052 3.32161 0.6201 0.6201
Comp2 1.01891 .184366 0.1456 0.7656
Comp3 .834546 .443705 0.1192 0.8849
Comp4 .390842 .116964 0.0558 0.9407
Comp5 .273877 .162216 0.0391 0.9798
Comp6 .111662 .0820227 0.0160 0.9958
Comp7 .0296392 . 0.0042 1.0000
Principal components (eigenvectors)
Variable Comp1 Comp2 Comp3 Comp4 Comp5 Comp6
price 0.2761 0.6781 -0.2652 0.5810 -0.1570 0.1653
mpg -0.4186 0.0202 0.1017 0.3700 0.7906 0.2281
rep78 -0.2222 0.7039 0.4923 -0.4419 0.0433 -0.1222
headroom 0.2713 -0.2016 0.8172 0.4367 -0.1624 0.0068
weight 0.4660 0.0442 -0.0304 -0.1611 0.2893 0.1408
length 0.4525 -0.0128 0.0808 -0.3368 0.2070 0.6132
displacement 0.4513 0.0388 -0.0420 0.0116 0.4421 -0.7141
Variable Comp7 Unexplained
price 0.0632 0
mpg 0.0050 0
rep78 -0.0259 0
headroom -0.0293 0
weight -0.8065 0
length 0.5063 0
displacement 0.2961 0

We typed pca price mpg ... displacement. All Stata commands share the same syntax: the names of the variables (dependent first and then independent) follow the command's name, and they are, optionally, followed by a comma and any options. In this case, we did not specify any options.

Having estimated the principal components, we can at any time type pca by itself to redisplay the principal-component output. We can also type screeplot to obtain a scree plot of the eigenvalues, and we can use the predict command to obtain the components themselves.

screeplot, typed by itself, graphs the proportion of variance explained by each component:

. screeplot

Typing screeplot, yline(1) ci(het) adds a line across the y-axis at 1 and adds heteroskedastic bootstrap confidence intervals.

. screeplot, yline(1) ci(het)

We can obtain the first two components by typing

. predict pc1 pc2, score
(5 components skipped)

Scoring coefficients
    sum of squares(column-loading) = 1   
Variable Comp1 Comp2 Comp3 Comp4 Comp5 Comp6
price 0.2761 0.6781 -0.2652 0.5810 -0.1570 0.1653
mpg -0.4186 0.0202 0.1017 0.3700 0.7906 0.2281
rep78 -0.2222 0.7039 0.4923 -0.4419 0.0433 -0.1222
headroom 0.2713 -0.2016 0.8172 0.4367 -0.1624 0.0068
weight 0.4660 0.0442 -0.0304 -0.1611 0.2893 0.1408
length 0.4525 -0.0128 0.0808 -0.3368 0.2070 0.6132
displacement 0.4513 0.0388 -0.0420 0.0116 0.4421 -0.7141
Variable Comp7
price 0.0632
mpg 0.0050
rep78 -0.0259
headroom -0.0293
weight -0.8065
length 0.5063
displacement 0.2961

The score option tells Stata's predict command to compute the scores of the components, and pc1 and pc2 are the names we have chosen for the two new variables. We could have obtained the first three factors by typing, for example, predict pc1 pc2 pc3, score.

An important feature of Stata is that it does not have modes or modules. We typed pca to estimate the principal components. We then typed screeplot to see a graph of the eigenvalues — we did not have to save the data and change modules. Similarly, we typed predict pc1 pc2, score to obtain the first two components. The new variables, pc1 and pc2, are now part of our data and are ready for use; we could now use regress to fit a regression model.

The two components should have correlation 0, and we can use the correlate command, which like every other Stata command, is always available for use. To verify that the correlation between pc1 and pc2 is zero, we type

. correlate pc1 pc2
(obs=69)
pc1 pc2
pc1 1.0000
pc2 0.0000 1.0000