Francesco Burchi wrote:
>>@ Jay
The theoretical reason for this aggregation is that the different variables
indicate different types of health knowledge.<<
OK, then it makes much sense to generate a sum score from this.
>>The following are the results of tetrachoric correlation:
Var1 Var2 Var3 Var4
Var1 1
Var2 .1819233 1
Var3 .3699331 .25242738 1
Var4 .18371493 .27407531 .40299934 1,
Thanks. Eyeballing this you have a positive manifold and some differences between different items. A one factor model is likely to be appropriate.
>>I was specifically asked whether I could justify my choice of one single
factor on the basis of the variance explained. Following your reasoning, I
could argue that with more than 1 factor it would be unidentified. Just to
be sure about the procedure I am following, I have tried to get results
keeping the 4 factors:
factormat R, n(6926) ipf factor(4)
Factor analysis/correlation Number of obs = 6926
Method: iterated principal factors Retained factors = 3
Rotation: (unrotated) Number of params = 6
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.28200 1.06199 0.8049 0.8049
Factor2 | 0.22001 0.12912 0.1381 0.9431
Factor3 | 0.09089 0.09108 0.0571 1.0001
Factor4 | -0.00019 . -0.0001 1.0000
--------------------------------------------------------------------------
Could I state that the first factor explains 80% of the common variance?<<
Yes, it's pretty clearly one dimensional, with the rest being junk that happens with item-level factor analysis. The uniquenesses associated with the loadings are totally in line with . I also ran the ML factor analysis using:
. factormat R, n(6296) ml factors(1) names(v1 v2 v3 v4)
(obs=6296)
Iteration 0: log likelihood = -216.46349
Iteration 1: log likelihood = -65.941751
Iteration 2: log likelihood = -63.980616
Iteration 3: log likelihood = -63.905495
Iteration 4: log likelihood = -63.90257
Iteration 5: log likelihood = -63.902458
Factor analysis/correlation Number of obs = 6296
Method: maximum likelihood Retained factors = 1
Rotation: (unrotated) Number of params = 4
Schwarz's BIC = 162.796
Log likelihood = -63.90246 (Akaike's) AIC = 135.805
--------------------------------------------------------------------------
Factor | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Factor1 | 1.20010 . 1.0000 1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated: chi2(6) = 2727.68 Prob>chi2 = 0.0000
LR test: 1 factor vs. saturated: chi2(2) = 127.75 Prob>chi2 = 0.0000
Factor loadings (pattern matrix) and unique variances
---------------------------------------
Variable | Factor1 | Uniqueness
-------------+----------+--------------
v1 | 0.4583 | 0.7900
v2 | 0.3732 | 0.8607
v3 | 0.7583 | 0.4250
v4 | 0.5252 | 0.7242
---------------------------------------
The chi square tests for this sample size are rather silly, ignore them. The loadings and uniquenesses are almost the same as for IPF (interestingly enough---that's not always true). It won't run anything higher dimensional but I doubt from looking at that tetrachoric correlation matrix you'd find anything.
>>
Finally, I have tried to add one or two further indicators to improve the
analysis. However, I had some theoretical doubts on the inclusion of these
variables, and the factor analysis with tetrachoric correlations gave me
loadings for these variables much lower than 0.1, thus I was convinced to
use only 4 variables.<
Are the tetrachoric correlations for the other two variables markedly lower or still meaningful? You might have an oblique two-factor solution.
Jay
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/