Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: What have I forgotten...?


From   Herb Smith <[email protected]>
To   [email protected]
Subject   Re: st: RE: What have I forgotten...?
Date   Fri, 20 Oct 2006 13:49:42 -0400 (EDT)

German--

	Wonderful explanation!  I will study again at greater length, but
am relieved as well as informed...

	Best,

	--Herb

Professor of Sociology and
Director, Population Studies Center
230 McNeil Building
3718 Locust Walk CR
University of Pennsylvania
Philadelphia, PA  19104-6298

[email protected]

215.898.7768 (office)
215.898.2124 (fax)

On Fri, 20 Oct 2006, German Rodriguez wrote:

> Herb,
>
> The short answer is that there's nothing wrong with your code, and the
> regression coefficients need just the right standardization to evolve into
> partial correlations.
>
> Let (y,x,z) ~ MVN(m,V). Partition m=(m1\m2) and V=(V11, V12 \ V21, V22) so y
> has mean m1 and variance V11 and the column vector x\z has mean m2 and
> variance V22.
>
> Then the conditional distribution of y|x\z is MVN with mean m1 - V12 V22^-1
> (x\z-m2) and variance V11 - V12 V22^-1 V21 [nicer-looking formulas in
> Wikipidea, see link below].
>
> We can do these calculations in Mata. In your example the unconditional
> means are all zero so we work just with V
>
> : V = (1, .2, .5 \ .2, 1, .2 \ .5, .2,  1)
>
> : b = V[1,(2,3)] * invsym( V[(2\3),(2,3)] )
>
> : b
>                  1             2
>     +-----------------------------+
>   1 |  .1041666667   .4791666667  |
>     +-----------------------------+
>
> The two regression coefficients are .104 and .479, just like your simulation
> shows. So the question now is why one agrees with the partial correlation
> and the other doesn't.
>
> The partial correlation yx.z comes from the conditional distribution of y
> and x given z, which has variance (I'll type rather than extract the values
> for clarity)
>
> : gz = (1, .2 \ .2, 1) - (.5 \ .2) * (.5 , .2)
>
> : gz
> [symmetric]
>          1     2
>     +-------------+
>   1 |  .75        |
>   2 |   .1   .96  |
>     +-------------+
>
> : corr(gz)[1,2]
>   .1178511302
>
> So the partial correlation is indeed 0.118. Note that given z the
> (conditional) variances of y and x are different.
>
> Now look at yz.x, which requires a different conditional distribution
>
> : gx = (1, .5 \ .5, 1) - (.2 \ .2) * (.2 , .2)
>
> : gx
> [symmetric]
>          1     2
>     +-------------+
>   1 |  .96        |
>   2 |  .46   .96  |
>     +-------------+
>
> : corr(gx)[1,2]
>   .4791666667
>
> And the partial correlation is indeed .479. Note that given x, the
> (conditional) variances of y and z happen to be the same. And therein lies a
> clue.
>
> Suppose we standardize the regression coefficients by the ratio of the
> standard deviations of the outcome and the predictor given the other
> predictor.
>
> For yz.x we do noting because the ratio is one. For yx.z we compute
>
> : b[1] * sqrt(gz[2,2]/gz[1,1])
>   .1178511302
>
> And we have the partial correlation! So all is well.
>
> As an aside, my favorite way of computing partial correlations like yx.z is
> to regress y on z and compute residuals y.z, then regress x on z and compute
> residuals x.z (read the dot as 'net of'). If you regress y.z on x.z you get
> a constant of zero and a slope equal to the coefficient of x in the
> regression of y on both x and z. And the correlation between y.z and x.z is
> the same as the partial correlation yx.z.
>
> Cheers,
> Germ�n
>
> P.S. for more readable MVN formulas see
> http://en.wikipedia.org/wiki/Multivariate_normal_distribution
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Herb Smith
> Sent: Friday, October 20, 2006 7:11 AM
> To: [email protected]
> Subject: st: What have I forgotten...?
>
> I have simulated three variables, X, Y, and Z, with means of 0, variances
> of 1, and a correlation matrix of
>
> 	Y	z
>
> X	.2	.2
>
> Y		.5
>
> I calculate (pen and paper, or -dis-) partial correlations of r_sub_yz.x =
> .479167 and r_sub_yx.z = .117851
>
> If I generate a large enough sample, I can reproduce my correlation matrix
> with -corr- and the anticipated partial correlations with -pcorr- (not to
> mention the anticipated means and standard deviations, as per -summ-)
>
> But, when I -regress- y x z (with or without -, beta-) I get
>
> b_sub_yz.x ~ .479 (as I rather imagined I would), but
>
> b_sub_yx.z ~ .104 (not ~.118)
>
> I am forgetting something elementary about the (non?)-correspondence
> between partial correlation coefficients and standardized regression
> coefficients (I should think); else there is something weird in my code...
>
> Thanks in advance,
>
> --Herb
>
> Herbert L. Smith
> Professor of Sociology and
> Director, Population Studies Center
> 230 McNeil Building
> 3718 Locust Walk CR
> University of Pennsylvania
> Philadelphia, PA  19104-6298
>
> [email protected]
>
> 215.898.7768 (office)
> 215.898.2124 (fax)
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index