Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: identifying age-matched controls in a cohort study
From
Phil Clayton <[email protected]>
To
[email protected]
Subject
Re: st: identifying age-matched controls in a cohort study
Date
Fri, 23 Aug 2013 18:52:41 +1000
More efficient version:
* example cohort data
* 1000 exposed and 20000 unexposed people
* study period is 01jan2005 to 31dec2009
clear
set obs 21000
gen id=_n
* random date of exposure during the study period for first 100 people
set seed 12345
gen expdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=1000
format %td expdate
* in this simulation we'll assume that the exposed patients are 1.2x as likely
* to die as the unexposed
* we'll make 20% of the unexposed die, so 240 of the exposed patients will get a
* death date - but if it's before their expdate we'll not count it
gen deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=240
replace deathdate=. if deathdate<=expdate
* random death date for 20% of the other 20000 people
* (assume the rest are alive at the end of 2009)
replace deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if inrange(id, 1001, 5000)
format %td deathdate
* age at start of study
gen age=rnormal(50, 5)
**** we have now finished constructing our dummy dataset ****
* now we classify patients as ever-exposed vs never-exposed
gen byte exposed=!missing(expdate)
* we need to know who's available for each match run
* at the start everyone is available
gen byte available=1
* pair is a new variable indicating each matched pair
* initially it's the patient id, but for matched cases it will be replaced with
* the control's id
gen pair=id
* now iteratively match until everyone's been matched to 1 neighbour
local finished=0
while `finished'==0 {
* randomly sort the data before running -psmatch- (in case of ties)
gen double rsort=runiform()
sort rsort
drop rsort
psmatch2 exposed if available, pscore(age) n(1) noreplace
* -psmatch2- creates an ID variable for everyone called _id
* for matched treated patients, the untreated match's ID is stored in
* the treated patient's _n1 variable
* anyone matched has a _weight of 1
sort _id
* the pair becomes the control's ID for exposed cases
* and then these cases are no longer "available" for future matching
quietly replace pair=id[_n1] if exposed & _weight==1 & deathdate[_n1]>expdate
quietly replace available=0 if exposed & _weight==1 & deathdate[_n1]>expdate
* anyone who was a matched control in this run should be made unavailable for further
* matching
* (to prevent endlessly matching exposed patients with dead ones)
quietly replace available=0 if !exposed & _weight==1
* see if we need to do any more match runs
quietly count if exposed & available
display "Patients still needing a match: " r(N)
if r(N)==0 local finished=1
}
* any pair with 2 observations is now a matched pair
* the others are unmatched
bysort pair: gen byte touse=_N==2
* confirm that we now have 1000 pairs
tab exposed if touse
* confirm that the ages are well matched
tabstat age if touse, by(exposed) s(n mean sd q)
bysort pair (exposed): gen agediff=age[1] - age[2] if touse
sum agediff, d
* confirm that the death dates of the controls are after the exposure dates
bysort pair (exposed): assert deathdate[1]>expdate[2] if touse
* now we can set up a survival analysis
* start date is the date of exposure and end date is death or 31dec2009
bysort pair (exposed): gen start=expdate[2] if touse
gen end=deathdate
replace end=td(31dec2009) if missing(end)
gen byte died=!missing(deathdate)
stset end, fail(died) origin(time start) scale(365.25) if(touse)
sts graph, by(exposed)
stcox exposed
On 23/08/2013, at 5:40 PM, Phil Clayton <[email protected]> wrote:
> There are different approaches. I use -psmatch2- (SSC) because it's quite convenient and fast. It's also trivial to extend your matching to use a propensity score rather than a single variable.
>
> You don't need to calculate the age at exposure - you can just match on age at the start of the study (or even date of birth).
>
> If someone was exposed part-way through the study, do you want to allow them to be a non-exposed control for someone who was exposed earlier?
>
> If not, you could use an iterative loop to match exposed patients until they've all been matched to a living unexposed control. Here is an example. It's probably not the most efficient way of doing it but it still doesn't take too long.
>
> Phil
>
> * example cohort data
> * 1000 exposed and 20000 unexposed people
> * study period is 01jan2005 to 31dec2009
> clear
> set obs 21000
> gen id=_n
>
> * random date of exposure during the study period for first 100 people
> set seed 12345
> gen expdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=1000
> format %td expdate
>
> * in this simulation we'll assume that the exposed patients are 1.2x as likely
> * to die as the unexposed
> * we'll make 20% of the unexposed die, so 240 of the exposed patients will get a
> * death date - but if it's before their expdate we'll not count it
> gen deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=240
> replace deathdate=. if deathdate<=expdate
>
> * random death date for 20% of the other 20000 people
> * (assume the rest are alive at the end of 2009)
> replace deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if inrange(id, 1001, 5000)
> format %td deathdate
>
> * age at start of study
> gen age=rnormal(50, 5)
>
> **** we have now finished constructing our dummy dataset ****
>
> * now we classify patients as ever-exposed vs never-exposed
> gen byte exposed=!missing(expdate)
>
> * we need to know who's available for each match run
> * at the start everyone is available
> gen byte available=1
>
> * pair is a new variable indicating each matched pair
> * initially it's missing for everyone
> gen pair=.
>
> * now iteratively match until everyone's been matched to 1 neighbour
> local finished=0
> while `finished'==0 {
> * randomly sort the data before running -psmatch- (in case of ties)
> gen double rsort=runiform()
> sort rsort
> drop rsort
>
> psmatch2 exposed if available, pscore(age) n(1) noreplace
>
> * -psmatch2- creates an ID variable for everyone called _id
> * for matched treated patients, the untreated match's ID is stored in
> * the treated patient's _n1 variable
> * anyone matched has a _weight of 1
> sort _id
>
> * the pair becomes the control's ID for exposed cases
> * and then these cases are no longer "available" for future matching
> quietly replace pair=id[_n1] if exposed & _weight==1 & deathdate[_n1]>expdate
> quietly replace available=0 if exposed & _weight==1 & deathdate[_n1]>expdate
>
> * now loop through the matched controls and update their pair variable to
> * become their ID (there is probably a more efficient way to do this)
> qui levelsof _n1 if exposed & _weight==1 & deathdate[_n1]>expdate, local(matches)
> qui foreach id of local matches {
> quietly replace pair=id if _id==`id'
> }
>
> * anyone who was a matched control in this run should be made unavailable for further
> * matching
> * (to prevent endlessly matching exposed patients with dead ones)
> quietly replace available=0 if !exposed & _weight==1
>
> * see if we need to do any more match runs
> quietly count if exposed & available
> display "Patients still needing a match: " r(N)
> if r(N)==0 local finished=1
> }
>
> * pairs now have a "pair" variable; these are the observations we want to use
> gen byte touse=!missing(pair)
>
> * confirm that we now have 1000 pairs
> tab exposed if touse
>
> * confirm that the ages are well matched
> tabstat age if touse, by(exposed) s(n mean sd q)
> bysort pair (exposed): gen agediff=age[1] - age[2] if touse
> sum agediff, d
>
> * confirm that the death dates of the controls are after the exposure dates
> bysort pair (exposed): assert deathdate[1]>expdate[2] if touse
>
> * now we can set up a survival analysis
> * start date is the date of exposure and end date is death or 31dec2009
> bysort pair (exposed): gen start=expdate[2] if touse
> gen end=deathdate
> replace end=td(31dec2009) if missing(end)
> gen byte died=!missing(deathdate)
> stset end, fail(died) origin(time start) scale(365.25) if(touse)
> sts graph, by(exposed)
> stcox exposed
>
>
>
>
> On 21/08/2013, at 6:49 AM, "Smit, Menno" <[email protected]> wrote:
>
>> Dear all,
>>
>> I am analysing data from a large cohort study in which some individuals become exposed during the 5 year observation period. For each exposed individual, how can I identify the nearest age-matched, unexposed individual that is alive on the date that the exposed become exposed?
>>
>> Many thanks,
>> Menno
>>
>> MD in Tropical Medicine “Mother & Child Health”
>> Research Assistant in Malaria Epidemiology
>> KEMRI/CDC, P.O.Box 1578, Kisumu 40100, Kenya.
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/