Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: principal component analysis-creating linear combinations
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: RE: principal component analysis-creating linear combinations
Date
Thu, 10 Mar 2011 15:12:44 +0000
Not so: There is an explicit example for exactly your need in the help.
Individual scores for the components are obtained via predict
. predict f1
. predict f1 f2
That is, for 2, 3, ... components, specify as many names as you need.
I am looking at Stata 11 documentation; if you are using an earlier version, you should state that as requested in the Statalist FAQ.
Nick
[email protected]
James Wu [mailto:[email protected]]
Nick, thank you very much.
But how can I obtain the second component scores (that would
correspond to Y2 that I called earlier) by using predict?
I read the manual on pca postestimation, but there is no indication on
it (only the first component scores).
On Thu, Mar 10, 2011 at 9:56 AM, Nick Cox <[email protected]> wrote:
> The easiest and best way to create the principal components themselves is use -predict- after -pca-. There is no need for you to do the calculation by typing out coefficients in a linear equation. That is even at best problematic in terms of keeping precision.
>
> The default of -pca- is to use the correlation matrix; that is entirely equivalent to using standardised variables, so that there is absolutely no need to standardise yourself, except possibly as an exercise.
>
> I wouldn't call the eigenvectors the PCs myself, although there are varying habits on this.
James Wu
> Suppose we ran pca on four variables, x1, x2, x3, x4 as follows:
> . pca x1 x2 x3 x4, components (3)
>
> Principal components/correlation Number of obs = 659
> Number of comp. = 3
> Trace = 4
> Rotation: (unrotated = principal) Rho = 0.9550
> --------------------------------------------------------------------------
> Component Eigenvalue Difference Proportion Cumulative
> -------------+------------------------------------------------------------
> Comp1 2.42894 1.67142 0.6072 0.6072
> Comp2 .757515 .124084 0.1894 0.7966
> Comp3 .633431 .453314 0.1584 0.9550
> Comp4 .180117 . 0.0450 1.0000
> --------------------------------------------------------------------------
> Principal components (eigenvectors)
> ----------------------------------------------------------
> Variable Comp1 Comp2 Comp3 Unexplained
> -------------+------------------------------+-------------
> x1 0.3894 0.8726 -0.2945 .00004265
> x2 0.4517 0.0966 0.8858 .0003491
> x3 0.5733 -0.3179 -0.2218 .09384
> x4 0.5619 -0.3580 -0.2817 .08588
> ----------------------------------------------------------
>
>
> Now, suppose that you decide to retain the firs two principal
> components, and then you want to create two variables that are linear
> combinations of the original four variables.
>
> Question1: Would it be simply to create by multiply the Principal
> Components (eigenvectors, columns) with the orginal variables, say,
> Y1=0.3894*x1+0.4517*x2+0.5733*x3+0.5619*x4 and
> Y2=0.8726*x1+0.0966*x2-0.3179*x3-0.3580*x4?
>
> Question 2: Assuming that I am correct in creating new variables by
> simply multiplying the Principal components (eigenvectors) with the
> orginal variables (Question 1),
> if these four original variables are in different units of
> measurement, then should we standardize the original four variables
> (so that each of standardized original variable has mean 0 and std of
> 1) before computing the multiproducts as in my Question 1?
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/