-gsample- seems to do it right (see -ssc d gsample-):
. sysuse cancer, clear
(Patient Survival in Drug Trial)
. bysort drug: gen case=_n==1
. gsample 10 if !case, wor strata(drug) keep
(15 observations deleted)
. tab case drug
| Drug type (1=placebo)
case | 1 2 3 | Total
-----------+---------------------------------+----------
0 | 10 10 10 | 30
1 | 1 1 1 | 3
-----------+---------------------------------+----------
Total | 11 11 11 | 33
ben
Peter wrote;
> I need 10 random controls matched per case for an epidemiological
study.
> Controls are matched to cases on birth month and gender. I am using
the
> -sample- command, and my problem can be demonstrated with the
following:
>
> sysuse cancer, clear
> bysort drug: gen case=_n==1
> sample 10 if !case, count by(drug) // Here, controls are
> matched to cases on drug, not birth month and gender.
> tab case drug
>
> I expected this command to draw 10 random persons with case==0 from
each
> drug group and keep all three with case==1. The problem is that I
> sometimes
> get a result like this:
>
> | Drug type (1=placebo)
> case | 1 2 3 | Total
> -----------+---------------------------------+----------
> 0 | 9 9 9 | 27
> 1 | 1 1 1 | 3
> -----------+---------------------------------+----------
> Total | 10 10 10 | 30
>
>
>
> - and sometimes like this:
>
>
> | Drug type (1=placebo)
> case | 1 2 3 | Total
> -----------+---------------------------------+----------
> 0 | 9 10 10 | 29
> 1 | 1 1 1 | 3
> -----------+---------------------------------+----------
> Total | 10 11 11 | 32
>
>
> But I expect the following, which I also get on occasion:
>
> | Drug type (1=placebo)
> case | 1 2 3 | Total
> -----------+---------------------------------+----------
> 0 | 10 10 10 | 30
> 1 | 1 1 1 | 3
> -----------+---------------------------------+----------
> Total | 11 11 11 | 33
>
>
> -help sample- file has no examples with both -if- and -by-, and I
suggest
> that Stata's behaviour be described. I am now using a workaround where
I
> save the cases to a file, delete them, -sample 10, by(drug) count- and
> -append- the cases back on. This is no big hassle, but it took me a
long
> time to discover that the -sample- command was responsible for the
varying
> number of controls per case.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/