Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: xt: unit-specific trends
From
László Sándor <[email protected]>
To
[email protected]
Subject
Re: st: xt: unit-specific trends
Date
Thu, 19 Apr 2012 13:11:31 -0400
I got curious about the Mata solution. Below is my code for a clumsy
Stata command to do that, and the test code preceding it. It was
really fast with a million observations under Stata 12.1 MP for mac.
Please let me know if it does something wrong, or it could be
improved. I think most if its performance will depend on the
panelsubview implementation…
Thanks again,
Laszlo
clear all
discard
set obs 1000000
g i = int(_n/10)
g t = mod(_n,10)
g bump = t > 5
g u = runiform()
g y = 5 + 2*t + 10*bump
g ry = .
detrendbump = y, trend(t) bump(bump) resid(ry) by(i)
exit
program define detrendbump
version 11
syntax =/exp [if] [in], trend(varname) bump(varname) resid(varname)
by(varname)
marksample touse
mata: _detrendbump("`exp' `bump' `trend' `resid'","`by'","`touse'")
end
mata:
mata set matastrict on
mata set matalnum off, permanently
mata set matafavor speed
void _detrendbump(string scalar a, ///
string scalar by,
string scalar touse)
{
real matrix M, X, pid, V, info
real colvector y, xmean, ymean, b
real scalar i
st_view(pid, ., by, touse)
st_view(V, .,a, touse)
info = panelsetup(pid, 1)
for (i=1; i<=rows(info); i++) {
panelsubview(M,V, i, info)
y = M[.,1]
X = M[.,2\3]
xmean = mean(X, 1)
ymean = mean(y, 1)
b = invsym(quadcrossdev(X,xmean , X,xmean))*quadcrossdev(X,xmean
, y,ymean)
M[.,4] = y - X*b + b[1]*X[.,1] :-ymean :+ xmean*b
}
}
end
exit
2012/4/19 László Sándor <[email protected]>
>
> Thanks, Nick, Austin,
>
> quickly on this: Before I rewrite my code again, I would appreciate a
> comment from StataCorp. I used "if `touse'" because that is the
> official way to make a program byable
> (http://www.stata.com/help.cgi?byable). If there is any case where the
> -if- condition need not be checked for the entire dataset, a -by: -
> run is that, isn't it? (By the way, as a -by- always run when sorted
> on the "by" variable, it is really prone to range subscripts.) So I
> expect that Stata takes care of that when runs a byable command with
> -bys: -.
>
> I am not sure my Mata code could be much better than a well-written
> -by:-, so first I would hear about whether -bys: - is written well. :)
> And my impression is that to drop _N-10 observations but then restore
> after a preserve is prohibitively costly.
>
> Thanks,
>
> Laszlo
>
> On Thu, Apr 19, 2012 at 9:53 AM, Nick Cox <[email protected]> wrote:
> >
> > As I understand it:
> >
> > Work with -if- is really dumb. Stata tests every observation. Even if
> > you say -if _n < 7- there is no intelligence saying "Oh, we're done
> > here" once you get to observation 7. How could there be? In that
> > example and some others working with -in- is a much, much better idea.
> > However, you're working with -if `touse'- here. One alternative is
> >
> > preserve
> > keep if `touse'
> > ...
> > restore
> >
> > but you still have to use -if-. It could be that
> >
> > replace `touse' = -`touse'
> > sort `touse'
> > qui count if `touse'
> > local last = r(N)
> >
> > and then doing things
> >
> > ... in 1/`last'
> >
> > helps.
> >
> > As for other stuff, I think most of the answers would be "It depends".
> >
> > On what MP does and doesn't do, see
> > http://www.stata.com/statamp/statamp.pdf
> >
> > Over time, the Mata content of Stata is going up, but many of the
> > basic operations are done using compiled C.
> >
> > I think Stata tech-support would need much more information than this
> > on your data, your code and your machine before they could make many
> > useful comments.
> >
> > More positively, you could use -set rmsg on- to see what is going most
> > slowly.
> >
> > Nick
> >
> > 2012/4/19 László Sándor <[email protected]>:
> > > Quick comments on this:
> > >
> > > I forgot to flag that the residual variable need to exist beforehand
> > > for -genbump- below, this is only replacing values of it.
> > >
> > > More importantly: The operation is still far, far from linear in the
> > > number of individuals (N in the panel — T is fixed). I could again
> > > finish a 1% subsample in around 10 minutes or so, but my bold attempt
> > > at 10% overnight still only finished 4 out of the 8 variables to be
> > > transformed this way in 10 or 11 hours.
> > >
> > > Maybe caching and memory is an issue here, but if anybody (StataCorp?)
> > > had a comment on this otherwise, that would be helpful.
> > >
> > > Maybe firing up _regress and _predict all the time is very costly? Or
> > > the marksample is not fast enough with the by option? (Does the code
> > > know that once it finished with seven consecutive rows there is
> > > nothing to check further below "whether" `touse' is 1 anywhere else? I
> > > guessed byable commands produce efficient subscripting for some
> > > underlying Mata code…) Or even the byable command does not use MP
> > > resources efficiently? (Still, even remaining serial, the speed-up
> > > could be much closer to linear, no?)
> > >
> > > I thought individual-specific trends are almost as trendy nowadays as
> > > fixed-effects — I wonder if they could be done much faster.
> > >
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/