Dear Stata Users,
I have a quick question. I am working with two datasets. The first one
(dataset "A") contains 3,500 cases, the second one (dataset "B") contains
150,000 cases.
In dataset "A" variable "x" equals 1, in dataset "B" 0. For a pre-study, I
would like to analyze how variable "x" influences the survival chances of a
subject. Each dataset contains further variables, while the distribution of
the values of these variables differs between them. For example, welfare can
take values between 0 and 10. In sample "B" welfare is generally higher than
in sample "A". Since I am not interested in the effect of welfare, I would
like to draw a stratified sample of dataset "B" containing subjects with the
similar quota of welfare than the subjects included in sample "A". The
datasets contain many other variables. Therefore, I would like to enlarge
this process on more than one variable.
Put differently, I would like to tell Stata: "Generate a sample "C" based on
the dataset "B" which holds the same proportion of variable values for
"welfare", "age", "location", . as sample "A"."
Finally, I would combine the stratified sample of "B" and the full sample of
"A" and run a logit model.
Is there a simple way of doing this in Stata? How can I generate a
stratified sample with Stata?
Thanks for your help
Simon
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/