Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Combining multiple imputation with propensity score matching
From
David Kantor <[email protected]>
To
[email protected]
Subject
Re: st: Combining multiple imputation with propensity score matching
Date
Tue, 02 Mar 2010 12:24:12 -0500
Hi.
As the author of mahapick, I would like to mention that, indeed, it
does not pick unique matches. (This could be an avenue for future development.)
You can specify that it generates a multitude of match candidates,
which is virtually a queue, in order of closeness, of possible
matches for each primary ("treated") case. You then can take this and
run a loop that visits primary cases in a random order. For each such case,
select the best candidate for the given primary case;
remove that selected match as a candidate for use in later passes
through the loop.
I recommend that if you want more than one match (say 3) per primary
case, that you run this loop several (3) times (maintaining the same
data structure that disqualifies candidates from future matching) --
rather than selecting, say, the best 3 matches for each case in one
pass through the loop. The latter method might enable earlier cases
in the loop to grab better matches.
Of course, this has a random element to the process. You may or may
not like that. But you need some way of deciding who gets a given
candidate if it is matched to more than one primary case.
I had done this selection process once, several years ago; I might be
able to dig up the code if necessary. My co-worker also had a plan to
somehow optimize the process by swapping matches in order to minimize
the sum of the distances. That was too complex to be done in Stata,
and we abandoned it. I understand that the task was taken up by
others (in C, I suppose), but the result was no better than the
original random process.
HTH
--David
At 11:17 AM 3/2/2010, John E. Cornell wrote:
Dear Stata Folks:
I have a large, and somewhat complicated multi-site dataset, that
requires the use of multiple imputation to fill-in missing lab
values that I need to generate propensity scores for three classes
of drugs. I used the new multiple imputation procedure based on
multivariate normal regression to fill-in the missing lab values. We
created 20 imputed datasets if the flong format, and used logistic
regression to compute and save the propensity scores in logit form
within each imputed set. We used mahapick to select to match cases
(being on one or more of the three agents) to controls (never on any
of the three agents). This worked well, but there are two problems
we encountered at this stage. First, the procedure selects the
closest match actual distance may be very large so we needed to edit
the matches to maintain a subset of cases with reasonable closeness.
Second, the procedure may match the same control to more than one
case, so we needed to restrict the sample to unique matches.
Finally, the number of matches varied between imputed sets.
It does not appear that the mi estimate command can handle this
situation. So, we are left with the prospect of writing our own code
to compute and combine the model estimates. We are relatively novice
Stata programmers at the moment, and we would welcome any
suggestions, references, etc. that the Stata community could provide
that will help us solve this problem.
Cheers,
John E. Cornell, Ph.D.
Professor
Department of Epidemiology and Biostatistics
University of Texas Health Science Center, San Antonio
7703 Floyd Curl Drive
San Antonio, Texas 78229-3900
[...]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/