Statalisters,
Thanks to Kit Baum, my minap program has been added to SSC. minap
calculates the minimum average partial correlation criterion for the number
of principal components to extract. Velicer(1976) proposed that, when
conducting principal components analysis as a
version of factor analysis, the number of components one should extract is
that at
which the average partial correlation of the variables, after partialling
out m
principal components, would be a minimum. minap calculates this criterion.
It can take as input either a variable list or a correlation matrix.
Many criteria for estimating the number of components in principal
components
analysis, or of factors in factor analysis, have been proposed (Gorsuch,
1983).
One relatively little used of these criteria is the minimum average partial
correlation proposed by Velicer (1976). The minap criteria is useful when
principal components is being used as an approximation to factor analysis,
as with
the Stata pcf option to the factor command. Gorsuch also points out
that,while
minap was developed for pricipal components analysis, it may also be usefu
for
common factor analysis.
This criterion has performed well in simulation studies with data with a
relatively clear factor structure (Zwick & Velicer, 1986). Gorsuch (1976),
however, warns that minimum average partial correlation may not perform well
and
may suggest underextraction when there are components or factors with only a
few
loadings. Similarly, in many applications of principal components analysis,
one
may be interested in components on which only one or two variables load.
minap
would be inappropriate in those cases.
For comparison purposes, the number of eigenvalues greter than one, claimed
by
Kaiser (1960) to be a good estimator of the number of components to extract,
is
also provided. In most cases, this rule will recommend the extraction of
more
components than will minap and Zwick and Velicer (1986) claim that it leads
to
overextraction.
It should be noted that no criterion can be counted on by itself to
determine the
number of components or factors to extract with real data. Considerations
of
interpretability (as Nick Cox points out) are also important. In general,
determining the precise number of components to retain matters more when the
component (or factor) solution will be rotated.
While I cetainly don't take this criterion as gospel, I find it useful. for
example, in a dataset I'm working on where theory strongly suggests 5
components (and the eigenvalue >1 rule, 7 components), the minap procedure
suggests 4. Lo and behold, the 4 component solution seems preferable in
several ways to the 5 (or 7).
If anyone wants to test it, I've included a little program to create a
matrix containing the correlation matrix from a classic data set of Harmon's
that Velicer analyzes in his paper:
program define harmon
************************************************************************
** Creates Harmon correlation matrix for testing map.do **
** Harmon correlation matrix for 8 Physical variables for 305 girls **
** Harmon(1976), p. 22 **
** for testing MAP program, map results in Velicer(1976). **
************************************************************************
mat Harmon = /*
*/ [1.000, .846, .805, .859, .473, .398, .301, .382\ /*
*/ .846, 1.000, .881, .826, .376, .326, .277, .415\ /*
*/ .805, .881, 1.000, .801, .380, .319, .237, .345\ /*
*/ .859, .826, .801, 1.000, .436, .329, .327, .365\ /*
*/ .473, .376, .380, .436, 1.000, .762, .730, .629\ /*
*/ .398, .326, .319, .329, .762, 1.000, .583, .577\ /*
*/ .301, .277, .237, .327, .730, .583, 1.000, .539\ /*
*/ .382, .415, .345, .365, .629, .577, .539, 1.000]
mat list Harmon
end
Bug reports, etc., to me.
Cheers,
Stephen
Stephen Soldz
The Center for Research, Evaluation, and Program Development
Boston Graduate School of Psychoanalysis
1581 Beacon St.
Brookline, MA 02446
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/