Dear all,
I have data from a large number of surveys (150). There are about 300 different variables, but not all of these variables are available in all the surveys.
I've entered the information on data availability in a STATA dataset as follows:
var001 var002 var003 var004 var005 ... var299 var300
dataset001 1 1 . . 1 . .
dataset002 1 . 1 . . 1 .
dataset003 . 1 . . 1 . 1
...
dataset150 1 1 1 1 1 1 1
(where the value is "1" if the variable is available, "." otherwise)
I'd like to be able to determine, for a given number of surveys, which combination of surveys I should select in order to have the largest number of variables in common.
For examples, if I decide to include 15 surveys in my analysis, which ones should I select to have the maximum number of variables (available in all 15 surveys) and what are these variables? What if I decide to include 16 surveys in my analysis? etc...
Unfortunately, the command mvpatterns doesn't work with such large number of variables. The command misschk doesn't do the job either.
Can anyone think of another way to extract this information using STATA?
Thanks,
Alex
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/