Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | László Sándor <sandorl@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: xt: unit-specific trends |
Date | Thu, 19 Apr 2012 13:11:31 -0400 |
I got curious about the Mata solution. Below is my code for a clumsy Stata command to do that, and the test code preceding it. It was really fast with a million observations under Stata 12.1 MP for mac. Please let me know if it does something wrong, or it could be improved. I think most if its performance will depend on the panelsubview implementation… Thanks again, Laszlo clear all discard set obs 1000000 g i = int(_n/10) g t = mod(_n,10) g bump = t > 5 g u = runiform() g y = 5 + 2*t + 10*bump g ry = . detrendbump = y, trend(t) bump(bump) resid(ry) by(i) exit program define detrendbump version 11 syntax =/exp [if] [in], trend(varname) bump(varname) resid(varname) by(varname) marksample touse mata: _detrendbump("`exp' `bump' `trend' `resid'","`by'","`touse'") end mata: mata set matastrict on mata set matalnum off, permanently mata set matafavor speed void _detrendbump(string scalar a, /// string scalar by, string scalar touse) { real matrix M, X, pid, V, info real colvector y, xmean, ymean, b real scalar i st_view(pid, ., by, touse) st_view(V, .,a, touse) info = panelsetup(pid, 1) for (i=1; i<=rows(info); i++) { panelsubview(M,V, i, info) y = M[.,1] X = M[.,2\3] xmean = mean(X, 1) ymean = mean(y, 1) b = invsym(quadcrossdev(X,xmean , X,xmean))*quadcrossdev(X,xmean , y,ymean) M[.,4] = y - X*b + b[1]*X[.,1] :-ymean :+ xmean*b } } end exit 2012/4/19 László Sándor <sandorl@gmail.com> > > Thanks, Nick, Austin, > > quickly on this: Before I rewrite my code again, I would appreciate a > comment from StataCorp. I used "if `touse'" because that is the > official way to make a program byable > (http://www.stata.com/help.cgi?byable). If there is any case where the > -if- condition need not be checked for the entire dataset, a -by: - > run is that, isn't it? (By the way, as a -by- always run when sorted > on the "by" variable, it is really prone to range subscripts.) So I > expect that Stata takes care of that when runs a byable command with > -bys: -. > > I am not sure my Mata code could be much better than a well-written > -by:-, so first I would hear about whether -bys: - is written well. :) > And my impression is that to drop _N-10 observations but then restore > after a preserve is prohibitively costly. > > Thanks, > > Laszlo > > On Thu, Apr 19, 2012 at 9:53 AM, Nick Cox <njcoxstata@gmail.com> wrote: > > > > As I understand it: > > > > Work with -if- is really dumb. Stata tests every observation. Even if > > you say -if _n < 7- there is no intelligence saying "Oh, we're done > > here" once you get to observation 7. How could there be? In that > > example and some others working with -in- is a much, much better idea. > > However, you're working with -if `touse'- here. One alternative is > > > > preserve > > keep if `touse' > > ... > > restore > > > > but you still have to use -if-. It could be that > > > > replace `touse' = -`touse' > > sort `touse' > > qui count if `touse' > > local last = r(N) > > > > and then doing things > > > > ... in 1/`last' > > > > helps. > > > > As for other stuff, I think most of the answers would be "It depends". > > > > On what MP does and doesn't do, see > > http://www.stata.com/statamp/statamp.pdf > > > > Over time, the Mata content of Stata is going up, but many of the > > basic operations are done using compiled C. > > > > I think Stata tech-support would need much more information than this > > on your data, your code and your machine before they could make many > > useful comments. > > > > More positively, you could use -set rmsg on- to see what is going most > > slowly. > > > > Nick > > > > 2012/4/19 László Sándor <sandorl@gmail.com>: > > > Quick comments on this: > > > > > > I forgot to flag that the residual variable need to exist beforehand > > > for -genbump- below, this is only replacing values of it. > > > > > > More importantly: The operation is still far, far from linear in the > > > number of individuals (N in the panel — T is fixed). I could again > > > finish a 1% subsample in around 10 minutes or so, but my bold attempt > > > at 10% overnight still only finished 4 out of the 8 variables to be > > > transformed this way in 10 or 11 hours. > > > > > > Maybe caching and memory is an issue here, but if anybody (StataCorp?) > > > had a comment on this otherwise, that would be helpful. > > > > > > Maybe firing up _regress and _predict all the time is very costly? Or > > > the marksample is not fast enough with the by option? (Does the code > > > know that once it finished with seven consecutive rows there is > > > nothing to check further below "whether" `touse' is 1 anywhere else? I > > > guessed byable commands produce efficient subscripting for some > > > underlying Mata code…) Or even the byable command does not use MP > > > resources efficiently? (Still, even remaining serial, the speed-up > > > could be much closer to linear, no?) > > > > > > I thought individual-specific trends are almost as trendy nowadays as > > > fixed-effects — I wonder if they could be done much faster. > > > > > > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/