Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: mata function for "lookup" or find rank if observation not in the ranked sample

From	László Sándor <[email protected]>
To	[email protected]
Subject	Re: st: mata function for "lookup" or find rank if observation not in the ranked sample
Date	Tue, 15 May 2012 23:32:59 -0400

This might be very useful for any one-dimensional matching, I achieved
a ten-times speedup (with N=5000). Look-up of neighbors is very easy
with permutation vectors, my previous email was only confused with a
baroque convolution of orderings and inverse orderings. Actually, I
can easily generate a "rank" vector for opposite treatments too, and
then things work as intended. Please see below if interested.

I am glad with the features of Stata and Mata, and grateful for the
support from this list.

Thanks,

Laszlo

mata:
...
// Out of the loop initial preparation:
// See new matching code!
psc0order0 = order(psc0,1)
psc0invorder0 = invorder(psc0order0)
psc1order1 = order(psc1,1)
psc1invorder1 = invorder(psc1order1)
// For matching cross-group:
// See new matching code!
pscorder = order(psc,1)
tsorted = t[pscorder]
closest1to = runningsum(tsorted)
for (i=1;tsorted[i]==0;i++) continue // never say closest treated is rank 0
closest1to[|1,1 \ i,1|] = J(i,1,1)
closest0to = runningsum(1:-tsorted)
for (i=1;tsorted[i]==1;i++) continue // never say closest untreated is rank 0
closest0to[|1,1 \ i,1|] = J(i,1,1)
closest1to0 = select(closest1to,1:-tsorted) // This in the order of
psc within treated
closest0to1 = select(closest0to,tsorted) // This in the order of psc
within untreated
closest0to1 = closest0to1[psc1invorder1] // This is immediately the
rank/invorder of the closest opposite
closest1to0 = closest1to0[psc0invorder0] // This is immediately the
rank/invorder of the closest opposite

// Then an instance of matching among own treatment goes like this:
// Matching treated to treateds
	for ( lcl = L; lcl<= 2*n1;lcl=2* lcl) {
		if (lcl > n1) {
			minindex(abs(psc1:-psc1[i]),L,yki,ykw)
			break
		}
		else {
			matchcandidateindices = psc1order1[|
max((psc1invorder1[i]-lcl,1)),1 \ min((psc1invorder1[i]+lcl,n1)),1|]
			minindex(abs(psc1[matchcandidateindices]:-psc1[i]),L,yki,ykw)
			if ( anyof(yki,1) | anyof(yki,rows(matchcandidateindices))) {
				continue
			}
			else {
				yki = matchcandidateindices[yki]
				break
			}
		}
	}

// And similarly cross-treatment:
// Matching treated to untreateds
	for ( lcl = M; lcl<= 2*n0;lcl=2* lcl) {
		if (lcl > n0) {
			minindex(abs(psc0:-psc1[i]),M,yki,ykw)
			break
		}
		else {
			matchcandidateindices = psc0order0[| max((closest0to1[i]-lcl,1)),1
\ min((closest0to1[i]+lcl,n0)),1|]
			minindex(abs(psc0[matchcandidateindices]:-psc1[i]),M,yki,ykw)
			if ( anyof(yki,1) | anyof(yki,rows(matchcandidateindices))) {
				continue
			}
			else {
				yki = matchcandidateindices[yki]
				break
			}
		}
	}
...
end

On Mon, May 14, 2012 at 10:31 PM, László Sándor <[email protected]> wrote:
>
> To follow-up on this:
>
> I figured out a way with permutation functions, though it is messy
> (FWIW, I paste it below).
>
> More importantly, I was surprised to see that even in samples of 50K,
> I suffered significant performance losses compared to minindexing over
> entire N-vectors. I know minindex is built-in, pure C etc., but is it
> reasonable that it can beat simple (well, simplish) lookup and some
> logical checks, and a minindex over a much shorter vector?
>
> I would very much appreciate some StataCorp comments on this. I
> invested a bit into this new approach (more complicated from
> cross-treatment matching) for coding (one-dimensional) matching, and
> I'm surprised by the result.
>
> Thanks,
>
> Laszlo
>
> ***** Code with some explanation:
> instead of simply using this to match treated observations close to
> the treated in question:
> minindex(abs(psc1:-psc1[i]),L,yki,ykw)
>
> I do some work first out of the loop:
>        psc1order1 = order(psc1,1)
>        psc1invorder1 = invorder(psc1order1)
>
> and then guess that the closest L (plus ties) might be in the 2L
> closest observation below or above the rank of the current observation
> in the ranking:
>        for ( lcl = 2*L; lcl<= 2*n1;lcl=2* lcl) {
>                if (lcl > n1) {
>                        minindex(abs(psc1:-psc1[i]),L,yki,ykw)
>                        break
>                }
>                else {
>                        matchcandidateindices = psc1order1[|
> max((psc1invorder1[i]-lcl,1)),1 \ min((psc1invorder1[i]+lcl,n1)),1|]
>                        minindex(abs(psc1[matchcandidateindices]:-psc1[i]),L,yki,ykw)
>                        if ( anyof(yki,1) | anyof(yki,rows(matchcandidateindices))) {
>                                continue
>                        }
>                        else {
>                                yki = matchcandidateindices[yki]
>                                break
>                        }
>                }
>        }
>
>
>
> On Mon, May 14, 2012 at 11:04 AM, László Sándor <[email protected]> wrote:
> > Hi,
> > I have another puzzle in Mata for Stata 10 and above. My previous
> > thread on this list help me match, say, treated observations to
> > treated observations, as I look up observations with propensity score
> > close to the observation's in a (selected) vector that the observation
> > itself is part of. The use of permutation vectors can be a dramatic
> > improvement compared to many-many runs of -mindex- on the full
> > propensity score vector.
> >
> > However, just as important would be to find observations with similar
> > propensity scores in the subsample with opposite treatment. The
> > observation itself is not part of that subsample, so I cannot simply
> > look up its own rank there in a permutation vector I generate only
> > once before I loop through all observations. Somehow I would need to
> > know the rank of, say, a treated observation in the subsample of the
> > control group. Actually, to really spare me costly runs on the entire
> > control-prop.score-vector for each (treated) iteration, this lookup
> > would better use some information I can generate for the whole
> > population and the ranks there and/or in the two subsamples.
> >
> > Please let me know if you have thoughts about this.
> >
> > Thanks!
> >
> > Laszlo
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: mata function for "lookup" or find rank if observation not in the ranked sample
  - From: László Sándor <[email protected]>
- Re: st: mata function for "lookup" or find rank if observation not in the ranked sample
  - From: László Sándor <[email protected]>

Prev by Date: st: results viewer not outputting from the do-file editor
Next by Date: Re: st: Statistical tests under heteroskedasticity
Previous by thread: Re: st: mata function for "lookup" or find rank if observation not in the ranked sample
Next by thread: st: significant gof with poisson in large N regression
Index(es):
- Date
- Thread