Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?
Date   Tue, 18 Aug 2009 17:18:53 +0100

I guess Cameron does not mean quite what he says, which is that factor
analysis can only be used on psychometric measures. In principle I can
readily imagine fruitful applications on quite different kinds of data.
But I largely agree with the spirit of his comment, which I take to be
-- my words not his -- that expecting factor analysis to see structure
in a mess independently of some understanding is likely to be expecting
far too much. However, my impression is that is exactly what almost all
users of factor analysis seem to expect! 

I've found occasional use of PCA in the following way. 

1. Plot the data. 

2. Calculate correlations, etc. 

3. Look at the results: get some ideas. 

4. Calculate PCs. 

5. Use PCs to help structure understanding of #1 and #2 in terms of
variables that go together, variables that are singletons, etc.
Sometimes, results of #1 and #2 now make more sense in their own terms.
(For example, a reordering of a scatter plot matrix or correlation
matrix makes it easier to see what is going on.) Often it is useful here
to look at a table of correlations between original variables and new
PCs. -cpcorr- from SSC helps with that. 

6. Now discard PC results and proceed with modelling. 

As in some fields every minor variation on a technique is blessed with a
name, I'll dub this disposable principal component analysis. 

Nick 
[email protected] 

Cameron McIntosh

Adrian,I think it would be a complete travesty to just feed that whole
dataset into a factor analysis. Sure, it'll lump together variables with
high correlations, but most of the time this doesn't reflect what's
going on underneath the data (e.g., a web of diect and indirect causal
relations that generated the observed associations/covariance matrix),
and this type of situation is what tends to give factor analysis a "bad
name" among statisticians. Factor analysis is typically only appropriate
for reflective psychometric measures written specifically to assess an
underlying trait (e.g., self-esteem, anxiety), not datasets like yours.
I think there are probably complex causal relations among your variables
that you should think hard about (using your theoretical knowledge about
these variables)and maybe come up with a path-analytic model or growth
curve model (say, GDP trajectory and its predictors). You could also
compare models across countries.

From: [email protected]

> Thank you to Cameron, Bob and everybody else for the references.
>
> I have a response to Jay and a couple more questions for everybody, if
you can still help me...

Jay wrote:
>> Before you go any further I think you have a big problem to consider:
100 variables on, say 200 countries means you have WAY more covariances
(or correlations) than you have countries. This means your correlation
matrix is singular.
>
>
> I don't think I have that problem because I don't have 200 countries.
I only have about 30+ countries.
>
> However, even if I had 200 countries, I don't understand exactly what
the problem would be because I have all 100 variables for country i and
all 100 variables for country j stacked on one another. So, I have:
>
> country year GDP inflation reserves
> Argentina 1990 2.3 6.4 100
> Argentina 1991 2.8 7.4 250
> Argentina 1992 2.6 7.0 200
> ...
> Argentina 2006 3.2 8.0 400
> Brazil 1990 1.7 5.4 120
> Brazil 1991 2.1 6.3 140
> Brazil 1992 2.5 7.0 180
> ...
>
>
> So the variables I enter into my factor analysis are GDP, inflation,
and reserves... and so the -factor- command in Stata knows nothing about
the panel/time-series structure of my data. I can see why it should be
relevant to account for the underlying panel structure of the data --
for instance, that jump in GDP/inflation/reserves and any other
variables between Argentina in 2006 and Brazil in 1990 may be a bit
strange to account for.
>
> So, the first question is: do I need to take this panel structure into
account? And if so, how?
>
> The other question is, do units matter? For instance, I know that
factor analysis or PCA are all based on a variance-covariance matrix...
but if I have two variables, x and y, and I take the covariance between
the two of them, that'll be different than if I take the covariance of,
say 2x and y:
>
> cov(x,y) <> cov(2x,y)
>
> and so what would happen if I express my GDP in dollars for all
countries or in local-currency units?? Or in millions or in billions???

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index