Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: principal component analysis-creating linear combinations |
Date | Thu, 10 Mar 2011 14:56:43 +0000 |
The easiest and best way to create the principal components themselves is use -predict- after -pca-. There is no need for you to do the calculation by typing out coefficients in a linear equation. That is even at best problematic in terms of keeping precision. The default of -pca- is to use the correlation matrix; that is entirely equivalent to using standardised variables, so that there is absolutely no need to standardise yourself, except possibly as an exercise. I wouldn't call the eigenvectors the PCs myself, although there are varying habits on this. Nick n.j.cox@durham.ac.uk James Wu Suppose we ran pca on four variables, x1, x2, x3, x4 as follows: . pca x1 x2 x3 x4, components (3) Principal components/correlation Number of obs = 659 Number of comp. = 3 Trace = 4 Rotation: (unrotated = principal) Rho = 0.9550 -------------------------------------------------------------------------- Component Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Comp1 2.42894 1.67142 0.6072 0.6072 Comp2 .757515 .124084 0.1894 0.7966 Comp3 .633431 .453314 0.1584 0.9550 Comp4 .180117 . 0.0450 1.0000 -------------------------------------------------------------------------- Principal components (eigenvectors) ---------------------------------------------------------- Variable Comp1 Comp2 Comp3 Unexplained -------------+------------------------------+------------- x1 0.3894 0.8726 -0.2945 .00004265 x2 0.4517 0.0966 0.8858 .0003491 x3 0.5733 -0.3179 -0.2218 .09384 x4 0.5619 -0.3580 -0.2817 .08588 ---------------------------------------------------------- Now, suppose that you decide to retain the firs two principal components, and then you want to create two variables that are linear combinations of the original four variables. Question1: Would it be simply to create by multiply the Principal Components (eigenvectors, columns) with the orginal variables, say, Y1=0.3894*x1+0.4517*x2+0.5733*x3+0.5619*x4 and Y2=0.8726*x1+0.0966*x2-0.3179*x3-0.3580*x4? Question 2: Assuming that I am correct in creating new variables by simply multiplying the Principal components (eigenvectors) with the orginal variables (Question 1), if these four original variables are in different units of measurement, then should we standardize the original four variables (so that each of standardized original variable has mean 0 and std of 1) before computing the multiproducts as in my Question 1? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/