Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Combining multiple survey data sets


From   James Swartz <[email protected]>
To   [email protected]
Subject   st: Combining multiple survey data sets
Date   Sun, 14 Feb 2010 16:02:43 -0600

All,

I searched for information on this topic and found a bit in archived threads, but not as much detail as I need. So at the risk of some redundancy, I would like to ask for help from a sampling statistician out there who also knows Stata well and who has worked with multiple survey data sets:

I am using two data sets. One is the National Comorbidity Survey Replication (NCS-R) and the other is a data set based on data I collected locally using the same instrument as the NCS-R. The N's are very different: the NCS-R part 2 is around 5,000 to 6,000 cases and my data set has only about 450 cases. Each data set has different survey parameters. I have no PSUs but do have stratification on gender and I developed weights to account for non-coverage and non-response. The NCS-R data set includes variables for weights, strata, and PSUs. Here are my questions:

1) In a simple bivariate analysis, I want to compare the prevalences of chronic medical conditions in each data set. But how can I tell Stata to use one set of survey parameters for cases in the NCS-R and another for cases in my local data set? Also, how important is it to control for a finite population correction factor? I have not done this in any analyses previously.

2) In a second step, I used the PSMATCH2 add-on to create a matched sample of 450 cases from the NCS-R data set based on a selected set of demographics and other characteristics. I then want to fun logistic regressions on the odds of having a chronic medical conditions while controlling for the matching variables (the matches were not perfect) and other unmatched characteristics. I assume that at this point, the survey parameters are not applicable because there is no way (that I can figure) to apply the subpopulation option. Is that correct? Is this analytic model reasonable given the data sets available or would there be a better way to approach this problem?

Thanks for any help. I have been scratching my head on this one for awhile.

James

--

James Swartz, Ph.D., Associate Professor
Jane Addams College of Social Work, University of Illinois at Chicago
1040 W. Harrison Street (MC 309)
Chicago IL. 60607

http://www.uic.edu/jaddams/college/subabuse/

P 312-996-8560
F 312-996-2770
C 312-961-3843

E (W): [email protected]
E (H): [email protected]

"That which stands in the way of our work, is our work."
        - Marcus Aurelius

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index