Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Dropping highly correlated variables


From   George Murray <[email protected]>
To   [email protected]
Subject   st: Dropping highly correlated variables
Date   Mon, 22 Jul 2013 19:07:13 +1000

Dear Statalist,

I am using a large dataset with 1000 or so variables. Let’s call them
v1,v2,…,v1000. My question is, if two variables have a correlation
coefficient greater than 0.99, how do I delete the variable with the
smallest number of observations (or choose to drop one arbitrarily in
the case that the both have the same number of observations)?

For example, if v123 and v456 have a 0.995 (>0.99) correlation
coefficient, and v456 has more observations than v123, how would I
drop v123?

I know that one may use –correlate v1-v1000-, and then proceed
manually (which is what I will have to do if no solution is found),
but I am hoping that someone knows of a quicker way. Incidentally,
this is not for any ‘statistical’ reason (e.g. multicollinearity), I
am just cleaning up the dataset.

Thank you,

George.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index