Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[no subject]
This is a really nice analysis of computation error. It's nice, but
it suffers from an error -- a trivial one -- with big consequences.
Once that trivial error is fixed, the -summarize- table reads,
Variable | Obs Mean Std. Dev. Min Max
-----------+-------------------------------------------------
err | 74 1.30e-12 2382.413 -4432.274 5771.01
err_mata | 74 8.17e-10 2382.413 -4432.274 5771.01
Thus, once again, -regress- is more accurate than syminv(X'X)*X'y.
What happened is that Michael forget that Stata defaults to storing
new variables as floats whereas Mata stores everyting as doubles. F
loats are less precise then doubles. There are three lines in
Michael's code that need fixing:
change to
--------------------------------------------
predict xb predict double xb
gen err=price-xb gen double err=price-xb
getmata err_mata getmata err_mata, double
So let's understand what happened.
1. Michael used -reg price weight length- to obtain results.
Those results are fully accurate.
2. Michael then coded -predict xb-, and so obtained a less
accurate recording of the prediction than he would have
gotten using -predict double xb-. The recording he got
was not inaccurate; it was accurate to about 1 part in
10,000,000, but that is not sufficient when accessing accuracy
that is is 1 part in 1,000,000,000,000 (1 quadrillion).
3. Michael than coded -gen err=price-xb-. Variable price is
recorded in full accuracy, but xb was rounded, and thus so was
err.
4. Put that all together, and the error appeared to be
-.0000198 when it really was 1.30e-12.
5. On the Mata side, everthing went swimmingly until Michael
exited Mata and coded -getmata err_mata-. The full-precision
-err_mata- variable in Mata was copied to a less than full
fedility Stata float variable -err_mata-. That loss of precision
changed the mean from 8.17e-10 to 6.20e-06.
Still, it was an easy mistake to make. More importantly, I don't want
anyone thinking that they need to use -double- for their own work.
Stata does all calculations in -double- regardless of how data are
stored. Storing results as -float is more than sufficient for most
real-life work. I am about to have a blog posting on exactly that
subject and, as a matter of fact, my current blog posting on "How to
read %21x format" at http://blog.stata.com/ is setting me up to address
the subject next week.
Meantime, here is the code to run if anyone wants to reproduce the
above results:
-------------------------------------------
sysuse auto, clear
reg price weight length
predict double xb
gen double err = price - xb
mata:
st_view(X=., ., "weight length")
X = X, J(rows(X), 1, 1)
st_view(Y=., ., "price")
beta = invsym(X'X)*X'Y
err_mata = Y-X*beta
end
getmata err_mata, double
sum err err_mata
-------------------------------------------
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/