Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Generating a matched pair sample for a case-control study
From
"Lacy,Michael" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Generating a matched pair sample for a case-control study
Date
Sun, 9 Dec 2012 20:45:05 +0000
On Fri, 7 Dec 2012, Sacamano, Paul L." <[email protected] wrote:
>I need to generate a sample of controls matching the age frequency distribution of the cases.
>
>Matching will be 1 (case) to 2 (controls). There are a total 63 cases that have already been randomly
>selected, and I need to match them with 126 controls from a pool of subjects. Cases have a 30-day
>hospital readmission, controls do not. I currently have all the cases in a Stata file. The pool for
>selecting matched controls is an Excel file that I can easily copy and paste into Stata.
>
>Is there a Stata command to generate a sample of matched pairs based on the age frequency distribution
>for cases that have already been randomly selected?
>
>Thanks for the help, Paul *
>
For a single attribute, frequency matching and pair matching are not distinguishable, right?
The following takes a file of controls and pair-matches them 2:1 by single
year of age with individuals in a file of 63 cases. It's possible there will not be
enough controls at a given age to match each case, which the following example
data instantiates, and which the code detects.
clear
// mock up control data
set seed 846
set obs 500 // don't know how many controls you have
gen byte case = 0
gen byte age = 20 +ceil(65*runiform()) // broad age range assumed
tempfile controls
sort age
save `controls'
clear
// mock up cases
set obs 63
gen byte case = 1
gen byte age = 20 +ceil(65*runiform())
//
// The real stuff starts here; you have an existing control file you can append to your cases.
append using `controls'
gen rand = runiform()
sort age case rand
by age: egen ncases = sum(case)
keep if (ncases >=1) // age groups with no cases are irrelevant
//
// The following keeps the first 2 controls for each case within each age group
by age: keep if (case ==1) | ((_n <= 2*ncases) & (case == 0))
tab2 age case
by age: egen ncontrols = sum(case == 0)
count if (ncontrols < 2*ncases)
Regards,
Mike Lacy
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784
Mike Lacy
Assoc. Prof./Dir. Grad. Studies
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784
970.491.6721 (voice)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/