Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: More re factor loadings
From
[email protected] (Kristin MacDonald, StataCorp LP)
To
[email protected]
Subject
Re: st: More re factor loadings
Date
Tue, 01 Oct 2013 08:31:27 -0500
Dave Garson <[email protected]> asks how to obtain the coefficients that
SPSS refers to as "factor score weights" and that SAS labels "latent variable
scores regression coefficients".
Fist, let me discuss the terminology we use in our documentation. I
recognize that different groups may call coefficients by different names, so I
want to make sure that there is no confusion. When we use the term "factor
loading", we are referring to the coefficients on paths from latent variables
to observed variables. These may be in the standardized or unstandardized
metric.
I believe that Dave would instead like the coefficients that can be used to
create a linear combination of the observed variables corresponding to the
predicted value of the latent variable. In Stata, we call these "scoring
coefficients" in the '[MV] factor postestimation' manual entry where we
discuss predictions of factors with exploratory factor analysis.
There is not an option to automatically obtain a matrix of regression scoring
coefficients after fitting a model with -sem-. However, if Dave is interested
in obtaining the predicted factor scores, he can use the -predict, latent-
command. For example,
webuse sem_1fmm, clear
sem (X -> x1 x2 x3 x4)
predict xpred, latent(X)
This creates a new variable, xpred, containing the predicted value of X.
If Dave is interested in the actual coefficients used in the linear
combination that produces these predictions, he can create them manually using
the matrices returned by -estat framework- after -sem-. In the case of a
standard CFA model, the coefficients are a function of the -r(Sigma)- matrix.
These coefficients are applied to the observed variables after they have been
centered. The -r(mu)- matrix contains the means of each variable which we can
use to center the observed variables. The code below demonstrates how to
predict the value of the latent variable X manually, for the above model:
estat framework, fitted
mat mu = r(mu)
mat sigma = r(Sigma)
mat sigma_zz = sigma[1..4,1..4]
mat inv_sigma_zz = syminv(sigma_zz)
mat sigma_zl = sigma[5,1..4]
mat scoef = inv_sigma_zz*sigma_zl'
mat list scoef
forvalues i = 1/4 {
gen x`i'_cent = x`i' - mu[1,`i']
}
gen mypred = scoef[1,1]*x1_cent + scoef[2,1]*x2_cent + ///
scoef[3,1]*x3_cent + scoef[4,1]*x4_cent
list xpred mypred in 1/10
The coefficients are stored in the scoef matrix and are then used to predict
the value of X in a new variable called mypred. These are equivalent to the
values produced by the -predict- command above. The output for the full set
of commands is given below my signature.
More complicated models containing structural paths not included in a CFA
model will require more matrix calculations that involve the fitted structural
path coefficients.
--Kristin
[email protected]
. use sem_1fmm, clear
(single-factor measurement model)
. sem (X -> x1 x2 x3 x4)
Endogenous variables
Measurement: x1 x2 x3 x4
Exogenous variables
Latent: X
Fitting target model:
Iteration 0: log likelihood = -2081.0258
Iteration 1: log likelihood = -2080.986
Iteration 2: log likelihood = -2080.9859
Structural equation model Number of obs = 123
Estimation method = ml
Log likelihood = -2080.9859
( 1) [x1]X = 1
------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Measurement |
x1 <- |
X | 1 (constrained)
_cons | 96.28455 1.271963 75.70 0.000 93.79155 98.77755
-----------+----------------------------------------------------------------
x2 <- |
X | 1.172364 .1231777 9.52 0.000 .9309398 1.413788
_cons | 97.28455 1.450053 67.09 0.000 94.4425 100.1266
-----------+----------------------------------------------------------------
x3 <- |
X | 1.034523 .1160558 8.91 0.000 .8070579 1.261988
_cons | 97.09756 1.356161 71.60 0.000 94.43953 99.75559
-----------+----------------------------------------------------------------
x4 <- |
X | 6.886044 .6030898 11.42 0.000 5.704009 8.068078
_cons | 690.9837 6.960137 99.28 0.000 677.3421 704.6254
-------------+----------------------------------------------------------------
var(e.x1)| 80.79361 11.66414 60.88206 107.2172
var(e.x2)| 96.15861 13.93945 72.37612 127.7559
var(e.x3)| 99.70874 14.33299 75.22708 132.1576
var(e.x4)| 353.4711 236.6847 95.14548 1313.166
var(X)| 118.2068 23.82631 79.62878 175.4747
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(2) = 1.78, Prob > chi2 = 0.4111
. predict xpred, latent(X)
.
.
. estat framework, fitted
Endogenous variables on endogenous variables
| observed
Beta | x1 x2 x3 x4
-------------+--------------------------------------------
observed |
x1 | 0
x2 | 0 0
x3 | 0 0 0
x4 | 0 0 0 0
----------------------------------------------------------
Exogenous variables on endogenous variables
| latent
Gamma | X
-------------+-----------
observed |
x1 | 1
x2 | 1.172364
x3 | 1.034523
x4 | 6.886044
-------------------------
Covariances of error variables
| observed
Psi | e.x1 e.x2 e.x3 e.x4
-------------+--------------------------------------------
observed |
e.x1 | 80.79361
e.x2 | 0 96.15861
e.x3 | 0 0 99.70874
e.x4 | 0 0 0 353.4711
----------------------------------------------------------
Intercepts of endogenous variables
| observed
alpha | x1 x2 x3 x4
-------------+--------------------------------------------
_cons | 96.28455 97.28455 97.09756 690.9837
----------------------------------------------------------
Covariances of exogenous variables
| latent
Phi | X
-------------+-----------
latent |
X | 118.2068
-------------------------
Means of exogenous variables
| latent
kappa | X
-------------+-----------
mean | 0
-------------------------
Fitted covariances of observed and latent variables
| observed | latent
Sigma | x1 x2 x3 x4 | X
-------------+--------------------------------------------+-----------
observed | |
x1 | 199.0004 |
x2 | 138.5813 258.6263 |
x3 | 122.2876 143.3656 226.2181 |
x4 | 813.9769 954.2769 842.0779 5958.551 |
-------------+--------------------------------------------+-----------
latent | |
X | 118.2068 138.5813 122.2876 813.9769 | 118.2068
----------------------------------------------------------------------
Fitted means of observed and latent variables
| observed | latent
mu | x1 x2 x3 x4 | X
-------------+--------------------------------------------+-----------
mu | 96.28455 97.28455 97.09756 690.9837 | 0
----------------------------------------------------------------------
. mat mu = r(mu)
. mat sigma = r(Sigma)
. mat sigma_zz = sigma[1..4,1..4]
. mat inv_sigma_zz = syminv(sigma_zz)
. mat sigma_zl = sigma[5,1..4]
.
. mat scoef = inv_sigma_zz*sigma_zl'
. mat list scoef
scoef[4,1]
latent:
X
observed:x1 .06875754
observed:x2 .06772851
observed:x3 .05763739
observed:x4 .10822142
.
. forvalues i = 1/4 {
2. gen x`i'_cent = x`i' - mu[1,`i']
3. }
.
. gen mypred = scoef[1,1]*x1_cent + scoef[2,1]*x2_cent + ///
> scoef[3,1]*x3_cent + scoef[4,1]*x4_cent
. list xpred mypred in 1/10
+-----------------------+
| xpred mypred |
|-----------------------|
1. | -26.55233 -26.55233 |
2. | 11.92044 11.92044 |
3. | 8.319204 8.319203 |
4. | -7.50836 -7.50836 |
5. | -3.87875 -3.878749 |
|-----------------------|
6. | .9258427 .9258427 |
7. | -4.445202 -4.445201 |
8. | 3.599469 3.599469 |
9. | -4.307086 -4.307086 |
10. | 6.506975 6.506975 |
+-----------------------+
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/