Adrian wrote:
>>However, even if I had 200 countries, I don't understand exactly what the problem would be because I have all 100 variables for country i and all 100 variables for country j stacked on one another. So, I have: <snip>
Oh, you have panel data. This is definitely different because errors are correlated within cases.
So the variables I enter into my factor analysis are GDP, inflation, and reserves... and so the -factor- command in Stata knows nothing about the panel/time-series structure of my data. I can see why it should be relevant to account for the underlying panel structure of the data -- for instance, that jump in GDP/inflation/reserves and any other variables between Argentina in 2006 and Brazil in 1990 may be a bit strange to account for.
So, the first question is: do I need to take this panel structure into account? And if so, how?<<
There is a panel type model for factor analysis and PCA. The big issue I see is that you have way more variables than most of those models are likely to accommodate nicely. Hmmm. Well if you can do some data reduction by grouping variables yourself and then making linear composites (sum scores), you could probably formulate a path analysis type model or dynamic factor model that would properly handle the autocorrelation. http://faculty.psy.ohio-state.edu/browne/ has some articles and software on dynamic factor models.
>>The other question is, do units matter? For instance, I know that factor analysis or PCA are all based on a variance-covariance matrix... but if I have two variables, x and y, and I take the covariance between the two of them, that'll be different than if I take the covariance of, say 2x and y: cov(x,y) <> cov(2x,y) and so what would happen if I express my GDP in dollars for all countries or in local-currency units?? Or in millions or in billions???
This is another problem. In the factor analysis world it's usually a wise idea to use covariances (for a number of technical reasons). However, that's really contingent on having comparable scales. Thus you may have to convert to standardized values just to be able to make reasonable linear composites.
Sounds like you've got some work cut out---this is much more complex than a simple factor analysis or PCA.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/