Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Interesting numerical accuracy/collinearity issue


From   "Rodrigo A. Alfaro" <[email protected]>
To   <[email protected]>
Subject   Re: st: Interesting numerical accuracy/collinearity issue
Date   Wed, 12 Apr 2006 21:07:45 -0400

Just in case: Numerical Recipes book is available on-line

http://www.numerical-recipes.com/nronline_switcher.html

Rodrigo.


----- Original Message ----- 
From: "Stas Kolenikov" <[email protected]>
To: <[email protected]>
Sent: Wednesday, April 12, 2006 5:45 PM
Subject: Re: st: Interesting numerical accuracy/collinearity issue


At your leisure time (probably a few years after retirement???...),
you might want to check a reference like Numerical Recipes or Demmel's
Applied Numerical Linear Algebra book from SIAM. The finite
arithemtics space can be formally set up, and it is really strange;
the series \sum 1/n converges in that space, for instance, according
to the difference in partial sums criterion. So yes, you can think
about linear algebra augmented by finite precision. Stata is generally
aware of strange properties of that space, so commands like -_rmcoll-
or  -issymmetric()- or -diag0cnt()- count the differences of matrix
entries or eigenvalues from zero up to that finite precision. If there
is a roundoff error introduced at some steps in shifting and scaling,
-_rmcoll- would be still able to tell if there is collinearity, up to
numerical accuracy of the X'X matrix.

The general principles of finite precision arithmetics are generally
based on condition numbers, which is, roughly, the largest possible
change in the answer due to infinitesimal change in the inputs of the
procedure. Invoking appropriate infinitesimal (mathematical rather
than computer!) calculus, the condition numbers for many matrix
operations like matrix inversions or determinants or linear systems
can be shown to be the ratios of the largest to the smallest
eigenvalues. Suppose this ratio for a your particular matrix is 10^4
(which is not that huge; in -reg pri wei for trunk disp- with auto
data, the condition number of the covariance matrix is 4e8). Then by
taking your variables to the fourth power, you'll get that number to
10^16 (not quite so, but you can think of this as the worst case
scenario), and that is already beyond the double arithmetics routinely
employed (may be in the guts of Stata there is also quad arithmetics;
I was coming across it a couple of times in Mata): epsdouble() =
2e-16, so a single blurp of that order leads to the change
epsdouble()*condition# approx= order of one: you cannot trust even the
first digit of your answer. That's, roughly, why the unscaled
variables are bad; and why you should center your variables; and why
integrated processes lead to weird distributions... oops, that's
another story, sorry :))

On 4/12/06, Schaffer, Mark E <[email protected]> wrote:
> My follow-up question is simple: why does the shifting and scaling used by 
> Stata's
> ‑ovtest‑ introduce greater accuracy rather than, say, greater rounding 
> error?  (Either
> accuracy or error would remove the numerical collinearity.)  The algebra 
> doesn't help
> me here, since all three methods are algebraically equivalent.  I'm 
> guessing that there's
> probably a general principle about how best to maintain numerical 
> precision, but I don't
> know what it might be.


--
Stas Kolenikov
http://stas.kolenikov.name

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index