Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: identifying age-matched controls in a cohort study
From
Phil Clayton <[email protected]>
To
[email protected]
Subject
Re: st: identifying age-matched controls in a cohort study
Date
Fri, 23 Aug 2013 17:40:41 +1000
There are different approaches. I use -psmatch2- (SSC) because it's quite convenient and fast. It's also trivial to extend your matching to use a propensity score rather than a single variable.
You don't need to calculate the age at exposure - you can just match on age at the start of the study (or even date of birth).
If someone was exposed part-way through the study, do you want to allow them to be a non-exposed control for someone who was exposed earlier?
If not, you could use an iterative loop to match exposed patients until they've all been matched to a living unexposed control. Here is an example. It's probably not the most efficient way of doing it but it still doesn't take too long.
Phil
* example cohort data
* 1000 exposed and 20000 unexposed people
* study period is 01jan2005 to 31dec2009
clear
set obs 21000
gen id=_n
* random date of exposure during the study period for first 100 people
set seed 12345
gen expdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=1000
format %td expdate
* in this simulation we'll assume that the exposed patients are 1.2x as likely
* to die as the unexposed
* we'll make 20% of the unexposed die, so 240 of the exposed patients will get a
* death date - but if it's before their expdate we'll not count it
gen deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if id<=240
replace deathdate=. if deathdate<=expdate
* random death date for 20% of the other 20000 people
* (assume the rest are alive at the end of 2009)
replace deathdate=td(01jan2005) + trunc(runiform()*5*365.25) if inrange(id, 1001, 5000)
format %td deathdate
* age at start of study
gen age=rnormal(50, 5)
**** we have now finished constructing our dummy dataset ****
* now we classify patients as ever-exposed vs never-exposed
gen byte exposed=!missing(expdate)
* we need to know who's available for each match run
* at the start everyone is available
gen byte available=1
* pair is a new variable indicating each matched pair
* initially it's missing for everyone
gen pair=.
* now iteratively match until everyone's been matched to 1 neighbour
local finished=0
while `finished'==0 {
* randomly sort the data before running -psmatch- (in case of ties)
gen double rsort=runiform()
sort rsort
drop rsort
psmatch2 exposed if available, pscore(age) n(1) noreplace
* -psmatch2- creates an ID variable for everyone called _id
* for matched treated patients, the untreated match's ID is stored in
* the treated patient's _n1 variable
* anyone matched has a _weight of 1
sort _id
* the pair becomes the control's ID for exposed cases
* and then these cases are no longer "available" for future matching
quietly replace pair=id[_n1] if exposed & _weight==1 & deathdate[_n1]>expdate
quietly replace available=0 if exposed & _weight==1 & deathdate[_n1]>expdate
* now loop through the matched controls and update their pair variable to
* become their ID (there is probably a more efficient way to do this)
qui levelsof _n1 if exposed & _weight==1 & deathdate[_n1]>expdate, local(matches)
qui foreach id of local matches {
quietly replace pair=id if _id==`id'
}
* anyone who was a matched control in this run should be made unavailable for further
* matching
* (to prevent endlessly matching exposed patients with dead ones)
quietly replace available=0 if !exposed & _weight==1
* see if we need to do any more match runs
quietly count if exposed & available
display "Patients still needing a match: " r(N)
if r(N)==0 local finished=1
}
* pairs now have a "pair" variable; these are the observations we want to use
gen byte touse=!missing(pair)
* confirm that we now have 1000 pairs
tab exposed if touse
* confirm that the ages are well matched
tabstat age if touse, by(exposed) s(n mean sd q)
bysort pair (exposed): gen agediff=age[1] - age[2] if touse
sum agediff, d
* confirm that the death dates of the controls are after the exposure dates
bysort pair (exposed): assert deathdate[1]>expdate[2] if touse
* now we can set up a survival analysis
* start date is the date of exposure and end date is death or 31dec2009
bysort pair (exposed): gen start=expdate[2] if touse
gen end=deathdate
replace end=td(31dec2009) if missing(end)
gen byte died=!missing(deathdate)
stset end, fail(died) origin(time start) scale(365.25) if(touse)
sts graph, by(exposed)
stcox exposed
On 21/08/2013, at 6:49 AM, "Smit, Menno" <[email protected]> wrote:
> Dear all,
>
> I am analysing data from a large cohort study in which some individuals become exposed during the 5 year observation period. For each exposed individual, how can I identify the nearest age-matched, unexposed individual that is alive on the date that the exposed become exposed?
>
> Many thanks,
> Menno
>
> MD in Tropical Medicine “Mother & Child Health”
> Research Assistant in Malaria Epidemiology
> KEMRI/CDC, P.O.Box 1578, Kisumu 40100, Kenya.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/