Dear Stata-listers,
I need 10 random controls matched per case for an epidemiological study.
Controls are matched to cases on birth month and gender. I am using the
-sample- command, and my problem can be demonstrated with the following:
sysuse cancer, clear
bysort drug: gen case=_n==1
sample 10 if !case, count by(drug) // Here, controls are
matched to cases on drug, not birth month and gender.
tab case drug
I expected this command to draw 10 random persons with case==0 from each
drug group and keep all three with case==1. The problem is that I sometimes
get a result like this:
| Drug type (1=placebo)
case | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 9 9 9 | 27
1 | 1 1 1 | 3
-----------+---------------------------------+----------
Total | 10 10 10 | 30
- and sometimes like this:
| Drug type (1=placebo)
case | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 9 10 10 | 29
1 | 1 1 1 | 3
-----------+---------------------------------+----------
Total | 10 11 11 | 32
But I expect the following, which I also get on occasion:
| Drug type (1=placebo)
case | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 10 10 10 | 30
1 | 1 1 1 | 3
-----------+---------------------------------+----------
Total | 11 11 11 | 33
-help sample- file has no examples with both -if- and -by-, and I suggest
that Stata's behaviour be described. I am now using a workaround where I
save the cases to a file, delete them, -sample 10, by(drug) count- and
-append- the cases back on. This is no big hassle, but it took me a long
time to discover that the -sample- command was responsible for the varying
number of controls per case.
Best regards,
Peter.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/