Thanks Jay for the detailed answer.
Regarding the additional variable, here below is the tetrachoric
correlation:
Var1 Var2 Var3 Var4 Var5
Var1 1
Var2 .1819233 1
Var3 .3699331 .25242738 1
Var4 .18371493 .27407531 .40299934 1
Var5 .0202 -.0033 -.0687 -.0637
1
How you can see, var5 is not correlated with the other 5 variables. If I run
factormat with all 5 factors I get:
factormat R, n(6926) ipf factor(5)
(obs=6926)
Factor analysis/correlation Number of obs = 6926
Method: iterated principal factors Retained factors = 4
Rotation: (unrotated) Number of params = 10
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.28517 1.03609 0.7553 0.7553
Factor2 | 0.24908 0.11225 0.1464 0.9017
Factor3 | 0.13684 0.10617 0.0804 0.9821
Factor4 | 0.03067 0.03086 0.0180 1.0001
Factor5 | -0.00019 . -0.0001 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(10) = 3087.75 Prob>chi2 = 0.0000
Factor loadings (pattern matrix) and unique variances
---------------------------------------------------------------------
Variable | Factor1 Factor2 Factor3 Factor4 | Uniqueness
-------------+----------------------------------------+--------------
Var1 | 0.4863 0.3566 0.0300 -0.0355 |
0.6342
Var2 | 0.4111 -0.0833 0.2339 -0.0944 |
0.7604
Var3 | 0.7321 0.0487 -0.1889 0.0323 |
0.4249
Var4 | 0.5833 -0.2811 0.0678 0.0679 |
0.5716
Var5 | -0.0592 0.1832 0.2023 0.1218 |
0.9072
---------------------------------------------------------------------
The loading for the 5th variable is extremely low and even negative and the
first factor seems to explain still the 75.5% of the common variance.
Francesco.
-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Verkuilen, Jay
Inviato: mercoledì 23 dicembre 2009 22.40
A: '[email protected]'
Oggetto: st: RE: R: RE: RE: Factor Analysis: which explained variance?
Francesco Burchi wrote:
>>@ Jay
The theoretical reason for this aggregation is that the different variables
indicate different types of health knowledge.<<
OK, then it makes much sense to generate a sum score from this.
>>The following are the results of tetrachoric correlation:
Var1 Var2 Var3 Var4
Var1 1
Var2 .1819233 1
Var3 .3699331 .25242738 1
Var4 .18371493 .27407531 .40299934 1,
Thanks. Eyeballing this you have a positive manifold and some differences
between different items. A one factor model is likely to be appropriate.
>>I was specifically asked whether I could justify my choice of one single
factor on the basis of the variance explained. Following your reasoning, I
could argue that with more than 1 factor it would be unidentified. Just to
be sure about the procedure I am following, I have tried to get results
keeping the 4 factors:
factormat R, n(6926) ipf factor(4)
Factor analysis/correlation Number of obs = 6926
Method: iterated principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 6
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.28200 1.06199 0.8049 0.8049
Factor2 | 0.22001 0.12912 0.1381 0.9431
Factor3 | 0.09089 0.09108 0.0571 1.0001
Factor4 | -0.00019 . -0.0001 1.0000
--------------------------------------------------------------------------
Could I state that the first factor explains 80% of the common variance?<<
Yes, it's pretty clearly one dimensional, with the rest being junk that
happens with item-level factor analysis. The uniquenesses associated with
the loadings are totally in line with . I also ran the ML factor analysis
using:
. factormat R, n(6296) ml factors(1) names(v1 v2 v3 v4)
(obs=6296)
Iteration 0: log likelihood = -216.46349
Iteration 1: log likelihood = -65.941751
Iteration 2: log likelihood = -63.980616
Iteration 3: log likelihood = -63.905495
Iteration 4: log likelihood = -63.90257
Iteration 5: log likelihood = -63.902458
Factor analysis/correlation Number of obs =
6296
Method: maximum likelihood Retained factors =
1
Rotation: (unrotated) Number of params =
4
Schwarz's BIC =
162.796
Log likelihood = -63.90246 (Akaike's) AIC =
135.805
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.20010 . 1.0000 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(6) = 2727.68 Prob>chi2 =
0.0000
LR test: 1 factor vs. saturated: chi2(2) = 127.75 Prob>chi2 =
0.0000
Factor loadings (pattern matrix) and unique variances
---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
v1 | 0.4583 | 0.7900
v2 | 0.3732 | 0.8607
v3 | 0.7583 | 0.4250
v4 | 0.5252 | 0.7242
---------------------------------------
The chi square tests for this sample size are rather silly, ignore them. The
loadings and uniquenesses are almost the same as for IPF (interestingly
enough---that's not always true). It won't run anything higher dimensional
but I doubt from looking at that tetrachoric correlation matrix you'd find
anything.
>>
Finally, I have tried to add one or two further indicators to improve the
analysis. However, I had some theoretical doubts on the inclusion of these
variables, and the factor analysis with tetrachoric correlations gave me
loadings for these variables much lower than 0.1, thus I was convinced to
use only 4 variables.<
Are the tetrachoric correlations for the other two variables markedly lower
or still meaningful? You might have an oblique two-factor solution.
Jay
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/