Stata
Products Purchase Support Company
Search
   >> Home >> Products >> Capabilities >> Multivariate methods >> Principal components

Principal components

Stata's pca command allows you to estimate parameters of principal-component models.
. webuse auto
(1978 Automobile Data)

. pca price mpg rep78 headroom weight length displacement foreign

Principal components/correlation                  Number of obs    =        69
                                                  Number of comp.  =         8
                                                  Trace            =         8
    Rotation: (unrotated = principal)             Rho              =    1.0000

    --------------------------------------------------------------------------
       Component |   Eigenvalue   Difference         Proportion   Cumulative
    -------------+------------------------------------------------------------
           Comp1 |       4.7823      3.51481             0.5978       0.5978
           Comp2 |       1.2675      .429638             0.1584       0.7562
           Comp3 |      .837857      .398188             0.1047       0.8610
           Comp4 |      .439668     .0670301             0.0550       0.9159
           Comp5 |      .372638      .210794             0.0466       0.9625
           Comp6 |      .161844     .0521133             0.0202       0.9827
           Comp7 |      .109731      .081265             0.0137       0.9964
           Comp8 |     .0284659            .             0.0036       1.0000
    --------------------------------------------------------------------------

Principal components (eigenvectors) 

    --------------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4     Comp5     Comp6 
    -------------+------------------------------------------------------------
           price |   0.2324    0.6397   -0.3334   -0.2099    0.4974   -0.2815 
             mpg |  -0.3897   -0.1065    0.0824    0.2568    0.6975    0.5011 
           rep78 |  -0.2368    0.5697    0.3960    0.6256   -0.1650   -0.1928 
        headroom |   0.2560   -0.0315    0.8439   -0.3750    0.2560   -0.1184 
          weight |   0.4435    0.0979   -0.0325    0.1792   -0.0296    0.2657 
          length |   0.4298    0.0687    0.0864    0.1845   -0.2438    0.4144 
    displacement |   0.4304    0.0851   -0.0445    0.1524    0.1782    0.2907 
         foreign |  -0.3254    0.4820    0.0498   -0.5183   -0.2850    0.5401 
    --------------------------------------------------------------------------

    ------------------------------------------------
        Variable |    Comp7     Comp8 | Unexplained 
    -------------+--------------------+-------------
           price |   0.2165   -0.0891 |           0 
             mpg |   0.1625    0.0115 |           0 
           rep78 |  -0.0813    0.0065 |           0 
        headroom |   0.0226    0.0252 |           0 
          weight |   0.1104    0.8228 |           0 
          length |   0.5437   -0.4921 |           0 
    displacement |  -0.7733   -0.2608 |           0 
         foreign |  -0.1173    0.0639 |           0 
    ------------------------------------------------
We typed pca price mpg ... foreign. All Stata commands share the same syntax: the names of the variables (dependent first and then independent) follow the command's name, and they are, optionally, followed by a comma and any options. In this case, we did not specify any options.

Having estimated the principal components, we can at any time type pca by itself to redisplay the principal-component output. We can also type screeplot to obtain a scree plot of the eigenvalues, and we can use the predict command to obtain the components themselves.

screeplot, typed by itself, graphs the proportion of variance explained by each component:


    . screeplot
Figure 1

Typing screeplot, yline(1) ci(het) adds a line across the y-axis at 1 and adds heteroskedastic bootstrap confidence intervals.

. screeplot, yline(1) ci(het)
Figure 2

We can obtain the first two components by typing

. predict pc1 pc2, score
(6 components skipped)

Scoring coefficients 
    sum-of-squares(column-loading) = 1

    --------------------------------------------------------------------------
        Variable |    Comp1     Comp2     Comp3     Comp4     Comp5     Comp6 
    -------------+------------------------------------------------------------
           price |   0.2324    0.6397   -0.3334   -0.2099    0.4974   -0.2815 
             mpg |  -0.3897   -0.1065    0.0824    0.2568    0.6975    0.5011 
           rep78 |  -0.2368    0.5697    0.3960    0.6256   -0.1650   -0.1928 
        headroom |   0.2560   -0.0315    0.8439   -0.3750    0.2560   -0.1184 
          weight |   0.4435    0.0979   -0.0325    0.1792   -0.0296    0.2657 
          length |   0.4298    0.0687    0.0864    0.1845   -0.2438    0.4144 
    displacement |   0.4304    0.0851   -0.0445    0.1524    0.1782    0.2907 
         foreign |  -0.3254    0.4820    0.0498   -0.5183   -0.2850    0.5401 
    --------------------------------------------------------------------------

    ----------------------------------
        Variable |    Comp7     Comp8 
    -------------+--------------------
           price |   0.2165   -0.0891 
             mpg |   0.1625    0.0115 
           rep78 |  -0.0813    0.0065 
        headroom |   0.0226    0.0252 
          weight |   0.1104    0.8228 
          length |   0.5437   -0.4921 
    displacement |  -0.7733   -0.2608 
         foreign |  -0.1173    0.0639 
    ----------------------------------


The score option tells Stata's predict command to compute the scores of the components, and pc1 and pc2 are the names we have chosen for the two new variables. We could have obtained the first three factors by typing, for example, predict pc1 pc2 pc3, score.

An important feature of Stata is that it does not have modes or modules. We typed pca to estimate the principal components. We then typed screeplot to see a graph of the eigenvalues — we did not have to save the data and change modules. Similarly, we typed predict pc1 pc2, score to obtain the first two components. The new variables, pc1 and pc2, are now part of our data and are ready for use; we could now use regress to fit a regression model.

The two components should have correlation 0, and we can use the correlate command, which like every other Stata command, is always available for use. To verify that the correlation between pc1 and pc2 is zero, we type

. correlate pc1 pc2
(obs=69)

             |      pc1      pc2
-------------+------------------
         pc1 |   1.0000
         pc2 |   0.0000   1.0000

See New in Stata 10 for more about what was added in Stata Release 10.

Stata 10
Overview: Why use Stata?
Stata/MP
64-bit Stata
Capabilities
Overview
Statistics
Basic statistics
Linear models
Multilevel mixed-effects models
Limited dependent variables
Panel data
GLM
Nonparametric
Exact statistics
ANOVA / MANOVA
Multivariate methods
Principal components
Cluster analysis
Bootstrapping
Model testing
Survey methods
Survival analysis
Epidemiology tools
Time series
Maximum likelihood
Normality tests
Other methods
Data management
Graphics
Matrix programming—Mata
Programming
Internet capabilities
Y2K
Accessibility
Sample session
New in Stata 10
Supported platforms
Which Stata package?
Technical support
User comments
Products
Stata 10
Order Stata
Upgrade
NetCourses
Bookstore
Stata Journal
Stata Press
Stata News
STB
Stat/Transfer
Gift Shop

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index