More details would be helpful, but using what you have...
Suppose you have a variable -subject- indicating (0/1) whether
an observation is a subject, and that -age- is an integer. Then
bys age : egen numsubs=sum(subject)
gen random=uniform()
gen numcontrols=20*numsubs
bys age subject (uniform): gen control=_n<=numcontrols&!subject
would do it. You can generalize this by creating a variable -group-
that stratifies subjects by your criteria.
hth,
Jeph
raoul reulen wrote:
Dear Statalisters
I need to select up to 20 controls for each of 10,000 subjects from a
dataset of around half-a-million subjects. The controls need to
satisfy certain criteria (e.g., same age). How can I do this without
having to loop over observations? Thanks.
Raoul
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/