I lied. Here's another Mata issue.
In nonparametric statistics such as, e.g., kernel
density estimation, it is often the case that certain
computations have to be repeated many times based on,
say, an observation window sliding over the data. This
makes nonparametric statistics slow.
I am currently working on kernel density estimation
for bounded variables. The Mata programs I wrote for
these purposes work fine and are relatively fast
(compared to the official -_KDE- which is used by
Stata's -kdensity- command). However, compared to say,
-regress-, they are slow.
I think that the programs could be made more efficient
if there was fast way to select the relevant portion of
the data (i.e. the data within the current observation
window) beforehand each computation (especially if large
datasets or complicated estimation functions are used).
Consider a u=unifom(500000,1) vector and suppose you
want to save as a new vector all elements of u within
[.5,.8]. The fastest way I could come up with to do
this is
. mata:
------ mata (type end to exit) ---------------------
: real colvector submat(real colvector I)
> {
> real scalar i, j
>
> j = 0
> for (i=1; i<=rows(I); i++) {
> if (I[i]==1) I[++j] = i
> }
> I = I[|1 \ j|]
> return(I)
> }
: end
----------------------------------------------------
r; t=0.03 16:18:57
. mata: u = uniform(500000,1)
r; t=0.02 16:18:57
. mata: u2 = u[submat((u:>=.5:&u:<=.8))]
r; t=0.43 16:18:57
which, however, is still slow (compared, e.g., to creating
a random vector).
Does any one have an idea how this could be sped up? Can
this be made any faster at all using Mata commands, or
would special C-programmed functions be needed?
Thanks,
ben
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/