A query re -_rmcoll-. As I understand it, this is implemented at the
executable level. (At any rate, I cannot find a -_rmcoll.ado- anywhere in
my c:\Stata8\ados\ path of my Windows 2000 system.) It is therefore not
immediately obvious exactly how it chooses which of a set of collinear
variables to drop. However, I get the impression (by experimenting) that it
builds a list of non-collinear variables iteratively by starting with an
empty list and iterating along the -varlist- of candidates provided, from
the first variable to the last in the given order, testing at each step
whether the latest candidate is a linear combination of the existing
non-collinear list, and adding the candidate to the list if it isn't. In
"pidgin Stata", I get the impression that it works something like as follows:
local ncvarlist ""
foreach X of var `origvarlist' {
if islinearlydependent(`X' `ncvarlist') {
disp as text "Note: `X' dropped due to collinearity"
}
else {
local ncvarlist "`ncvarlist' `X'"
}
}
where -ncvarlist- is the list of non-collinear variables being assembled,
-origvarlist- is the original variable list provided, and
-islinearlydependent(varlist)- is a fantasy function returning 1 if the
variables of -varlist- are linearly dependent and 0 otherwise. I get that
impression because, if I include 2 variables with identical values in a
list of X-variables, then it is the second one that is dropped, not the first.
Can I assume that, in general, this is how -_rmcoll- works? And, if not,
then how does it work?
Best wishes (and thanks in advance)
Roger
--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom
Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: [email protected]
Website: http://phs.kcl.ac.uk/rogernewson/
Opinions expressed are those of the author, not the institution.