Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | László Sándor <sandorl@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: mata function for "lookup" or find rank if observation not in the ranked sample |
Date | Tue, 15 May 2012 23:32:59 -0400 |
This might be very useful for any one-dimensional matching, I achieved a ten-times speedup (with N=5000). Look-up of neighbors is very easy with permutation vectors, my previous email was only confused with a baroque convolution of orderings and inverse orderings. Actually, I can easily generate a "rank" vector for opposite treatments too, and then things work as intended. Please see below if interested. I am glad with the features of Stata and Mata, and grateful for the support from this list. Thanks, Laszlo mata: ... // Out of the loop initial preparation: // See new matching code! psc0order0 = order(psc0,1) psc0invorder0 = invorder(psc0order0) psc1order1 = order(psc1,1) psc1invorder1 = invorder(psc1order1) // For matching cross-group: // See new matching code! pscorder = order(psc,1) tsorted = t[pscorder] closest1to = runningsum(tsorted) for (i=1;tsorted[i]==0;i++) continue // never say closest treated is rank 0 closest1to[|1,1 \ i,1|] = J(i,1,1) closest0to = runningsum(1:-tsorted) for (i=1;tsorted[i]==1;i++) continue // never say closest untreated is rank 0 closest0to[|1,1 \ i,1|] = J(i,1,1) closest1to0 = select(closest1to,1:-tsorted) // This in the order of psc within treated closest0to1 = select(closest0to,tsorted) // This in the order of psc within untreated closest0to1 = closest0to1[psc1invorder1] // This is immediately the rank/invorder of the closest opposite closest1to0 = closest1to0[psc0invorder0] // This is immediately the rank/invorder of the closest opposite // Then an instance of matching among own treatment goes like this: // Matching treated to treateds for ( lcl = L; lcl<= 2*n1;lcl=2* lcl) { if (lcl > n1) { minindex(abs(psc1:-psc1[i]),L,yki,ykw) break } else { matchcandidateindices = psc1order1[| max((psc1invorder1[i]-lcl,1)),1 \ min((psc1invorder1[i]+lcl,n1)),1|] minindex(abs(psc1[matchcandidateindices]:-psc1[i]),L,yki,ykw) if ( anyof(yki,1) | anyof(yki,rows(matchcandidateindices))) { continue } else { yki = matchcandidateindices[yki] break } } } // And similarly cross-treatment: // Matching treated to untreateds for ( lcl = M; lcl<= 2*n0;lcl=2* lcl) { if (lcl > n0) { minindex(abs(psc0:-psc1[i]),M,yki,ykw) break } else { matchcandidateindices = psc0order0[| max((closest0to1[i]-lcl,1)),1 \ min((closest0to1[i]+lcl,n0)),1|] minindex(abs(psc0[matchcandidateindices]:-psc1[i]),M,yki,ykw) if ( anyof(yki,1) | anyof(yki,rows(matchcandidateindices))) { continue } else { yki = matchcandidateindices[yki] break } } } ... end On Mon, May 14, 2012 at 10:31 PM, László Sándor <sandorl@gmail.com> wrote: > > To follow-up on this: > > I figured out a way with permutation functions, though it is messy > (FWIW, I paste it below). > > More importantly, I was surprised to see that even in samples of 50K, > I suffered significant performance losses compared to minindexing over > entire N-vectors. I know minindex is built-in, pure C etc., but is it > reasonable that it can beat simple (well, simplish) lookup and some > logical checks, and a minindex over a much shorter vector? > > I would very much appreciate some StataCorp comments on this. I > invested a bit into this new approach (more complicated from > cross-treatment matching) for coding (one-dimensional) matching, and > I'm surprised by the result. > > Thanks, > > Laszlo > > ***** Code with some explanation: > instead of simply using this to match treated observations close to > the treated in question: > minindex(abs(psc1:-psc1[i]),L,yki,ykw) > > I do some work first out of the loop: > psc1order1 = order(psc1,1) > psc1invorder1 = invorder(psc1order1) > > and then guess that the closest L (plus ties) might be in the 2L > closest observation below or above the rank of the current observation > in the ranking: > for ( lcl = 2*L; lcl<= 2*n1;lcl=2* lcl) { > if (lcl > n1) { > minindex(abs(psc1:-psc1[i]),L,yki,ykw) > break > } > else { > matchcandidateindices = psc1order1[| > max((psc1invorder1[i]-lcl,1)),1 \ min((psc1invorder1[i]+lcl,n1)),1|] > minindex(abs(psc1[matchcandidateindices]:-psc1[i]),L,yki,ykw) > if ( anyof(yki,1) | anyof(yki,rows(matchcandidateindices))) { > continue > } > else { > yki = matchcandidateindices[yki] > break > } > } > } > > > > On Mon, May 14, 2012 at 11:04 AM, László Sándor <sandorl@gmail.com> wrote: > > Hi, > > I have another puzzle in Mata for Stata 10 and above. My previous > > thread on this list help me match, say, treated observations to > > treated observations, as I look up observations with propensity score > > close to the observation's in a (selected) vector that the observation > > itself is part of. The use of permutation vectors can be a dramatic > > improvement compared to many-many runs of -mindex- on the full > > propensity score vector. > > > > However, just as important would be to find observations with similar > > propensity scores in the subsample with opposite treatment. The > > observation itself is not part of that subsample, so I cannot simply > > look up its own rank there in a permutation vector I generate only > > once before I loop through all observations. Somehow I would need to > > know the rank of, say, a treated observation in the subsample of the > > control group. Actually, to really spare me costly runs on the entire > > control-prop.score-vector for each (treated) iteration, this lookup > > would better use some information I can generate for the whole > > population and the ranks there and/or in the two subsamples. > > > > Please let me know if you have thoughts about this. > > > > Thanks! > > > > Laszlo > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/