Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: matrix operations in STATA and MATA do not match!
From
Wu Zhang <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: matrix operations in STATA and MATA do not match!
Date
Sun, 7 Apr 2013 21:06:57 -0700 (PDT)
Hi Bill,
Thank you very much for your professional help!
Wu
________________________________
From: "William Gould, StataCorp LP" <[email protected]>
To: [email protected]
Sent: Friday, April 5, 2013 11:05 AM
Subject: Re: st: matrix operations in STATA and MATA do not match!
Wu Zhang <[email protected]> writes,
> I am trying to calculate the ordinary least square by matrix; the
> formula is beta_hat=(X'*X)^(-1)*X'*Y
>
> In STATA, I run the code: matrix beta_hat= syminv(X'*X)*X'*Y
>
> In MATA, I run the code: beta_hat= luinv(X'*X)*X'*Y
>
> Interestingly, all the estimates of parameters of Xs are exactly
> the same while the estimates for the constant are different. Could
> you guys explain how it happens?
I have a do-file to show you, but first, here are my comments, some of
which are demonstrated in the do-file's results:
1. Mata's -luinv()- and Stata's -syminv()- are different
inverters and produce numerically different answers.
I will not go into the advantages and disadvantages of
each here.
2. Mata provides other inverters, too. I will not go into
the advantages and disadvantages of each of them here.
3. That Mata inverter that is identical to the Stata inverter
-syminv()- is -invsym()-. Yes, I know the names are turned
around. The Mata name -invsym()- is more logical than
than the Stata name. -invsym()- is one way of calculating
the inverse of a symmetric matrix. The Stata name is older.
4. Mata's -invsym()- and Stata's -invsym()- produce identical
results.
5. The results produced by all functions are nearly identical
in a simple example I provide, an example that is poorly scaled.
The maximum relative error is 1.95e-13.
6. I say results are "nearly identical", but I should mention that,
for a "simple" problem like this one, a relative difference of
1.95e-13 against *TRUTH* would never make it out the door at
StataCorp. -regress- does a far more accurate job of producing
regression estimates.
7. The problem is not with -syminv()-, -invsym()-, or -luinv()-.
Each of those functions is doing exactly what it is designed
to do, and accurately. The problem is that (X'X)^(-1)*X'y is a
lousy way of calculating regression coefficients, numerically
speaking. For well-scaled problems, however, (X'X)^(-1)*X'y
can do a satisfactory and even an excellent job.
8. FYI, -regress- uses -syminv()- (equal to -invsym()-), but it
also uses another inverter, too, and it uses a solver. That
is, -regress- inverts the matrix three times in three different
ways to obtain accurate answers!
Inversion is only part of the problem with (X'X)^(-1)*X'y. The
other problems are the calculations of of (X'X) and X'y. Mata
provides functions to calculate these values accurately. See
-help mata crossdev()- and -help quadcross()-. -regress- does
the accumulation deviation form along the lines of crossdev(),
and in some places, uses quad precision.
I apologize to Lu if it seems like I'm coming down hard on him. I do
appreciate that Lu was asking an innocent and reasonable question. I
ask everyone, however, to be careful and not to write Subject lines
like "matrix operations in STATA and MATA do not match!" unless they
are certain they are right.
On the other hand, by writing that subject line, Lu did get an answer.
The following Stata log may intrest Lu:
-----------------------------------------------------------------------
. *
. * Step 1: post some of auto.dta to Mata
. *
. webuse auto
(1978 Automobile Data)
. drop if rep78==.
(5 observations deleted)
. tomata price mpg weight length turn foreign
.
. *
. * Step 2: In Mata, calculate (X'X)^(-1)*X'y
. * using invsum() and luinv().
. * Show estiamtes, difference, and relative difference
. *
. mata:
------------------------------------------------- mata (type end to exit) -----
: X = (mpg, weight, foreign, J(st_nobs(), 1, 1))
: y = price
: b1 = invsym(X'X)*X'y
: b2 = luinv(X'X)*X'y
: b1, b2, b1-b2, reldif(b1,b2)
1 2 3 4
+-------------------------------------------------------------+
1 | 34.34101721 34.34101721 -6.89937e-12 1.95223e-13 |
2 | 3.6062878 3.6062878 -4.70735e-14 1.02194e-14 |
3 | 3678.584438 3678.584438 -3.54703e-11 9.63976e-15 |
4 | -6639.010119 -6639.010119 3.43789e-10 5.17754e-14 |
+-------------------------------------------------------------+
: /*
> *
> * Step 3: Post matrices X and y back to Stata
> * Calculate (X'X)^(-1)*X'y in Stata.
> * Post result back to Mata.
> * Show difference in between b2 and bstata.
> * They will be identical.
> */
: st_matrix("X", X)
: st_matrix("y", y)
: end
-------------------------------------------------------------------------------
. matrix bstata = syminv(X'*X)*X'*y
. mata:
------------------------------------------------- mata (type end to exit) -----
: bstata = st_matrix("bstata")
: b1, bstata, b1-bstata
1 2 3
+----------------------------------------------+
1 | 34.34101721 34.34101721 0 |
2 | 3.6062878 3.6062878 0 |
3 | 3678.584438 3678.584438 0 |
4 | -6639.010119 -6639.010119 0 |
+----------------------------------------------+
: end
-----------------------------------------------------------------------
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/