Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: RE: st: Converting a SAS datastep to Stata
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: RE: st: Converting a SAS datastep to Stata
Date
Fri, 17 Dec 2010 13:32:24 -0500
Bill, Dan, et al.:
I initially thought of panelsetup() as well--and I definitely think
some of the processing should be done in Mata. But I think the
separate programs for separate years should be organized
differently--you define Mata functions for each year, and then a
single program invokes the appropriate Mata function for each year and
restricts the data as well.
The only issue is translating the SAS code into Mata, but this is
almost trivial--the code in each if block goes into a separate Mata
function and there are only a few translations to take care of.
Where you would code
_amt5pc = min(c24533,min(c24532,min(c62700,c24517)));
you now code
_amt5pc = rowmin((c24533,rowmin((c24532,rowmin((c62700,c24517))))));
that is,
code rowmin((a,b)) for min(a,b). Global search and replace to double
all parentheses.
Where you would code
_amt5pc = max(0,_amt5pc);
you now code
_amt5pc = rowmax((J(rows(_amt5pc),1,0),_amt5pc));
that is, make a vector of zeros for any 0. Probably easier to just
define z to be a vector of zeros up front and replace all 0's with
z's. The semicolons are optional.
Cut and paste this whole example into the Command window:
*set up some fake data for example
clear all
sysuse auto
keep in 1/10
ren price c24533
ren mpg c24532
ren turn c62700
ren trunk c24517
ren headroom e24583
ren weight e24515
ren length c24516
g yr=2000+floor(_n/3)
g id=_n
sort yr id
compress
list id yr c*, sepby(yr) noo
*now run example--note how everything in { } is very close to original SAS code
mata:
mata set matastrict off
void FLPDYR2003() {
external c24533,c24532,c24517,c24516,c62700,e24583,e24515
external id,_amt5pc,_amt8pc,_amt10pc,_amt20pc,_amt25pc
_amt5pc = rowmin((c24533,rowmin((c24532,rowmin((c62700,c24517))))));
z=J(rows(_amt5pc),1,0);
_amt5pc = rowmax((z,_amt5pc));
c62747 = .05*_amt5pc;
_line49 = rowmax((z,rowmin((c24532,rowmin((c24517,c62700))))-_amt5pc));
_line50 = rowsum(e24583,0);
_amt8pc = rowmin((_line49,_line50));
c62749 = .08*_amt8pc;
_amt10pc = _line49 - _amt8pc;
c62750 = .1*_amt10pc;
_line55 = c24533 - _amt5pc;
_line56 = rowmin((c24517,c62700)) - rowmin((c24532,rowmin((c24517,c62700))));
_amt15pc = rowmin((_line55,_line56));
c62755 = .15*_amt15pc;
_amt20pc = _line56 - _amt15pc;
c62760 = .2*_amt20pc;
_amt25pc = rowmin((c62700,rowmin((c24517+e24515,c24516))))-rowmin((c62700,c24517));
c62770 = .25*_amt25pc;
_tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770;
}
end
prog FLPDYR
syntax anything [if]
conf num `anything'
if !inrange(`anything',2000,2009) {
di as err "Year out of range"
error 198
}
putmata id c* e* `if', view replace
mata: FLPDYR`anything'()
getmata id _amt5pc _amt8pc _amt10pc _amt20pc _amt25pc, update id(id)
end
FLPDYR 2003 if yr==2003
list yr c* _amt*, sepby(yr) noo
The Mata could be a little less sloppy but the year as argument and if
qualifier separately in the program is intentional--I can see where
you might want to use data from 2003 but tax law from 2002 or what
have you.
On Thu, Dec 16, 2010 at 5:10 PM, William Gould, StataCorp LP
<[email protected]> wrote:
> I wrote,
>
> WG> [...]that is what I would do, probably. With Mata, I can go
> WG> through the observations one at a time just as SAS does.
>
> Daniel Feenberg <[email protected]> replied,
>
> DF> Do you mean a "for" loop over observations?
> DF> [...]
> DF> Wouldn't that structure be subject to the complaint you voiced
> DF> about explicitly looping over observations? [...] If that
> DF> doesn't apply to Mata (perhaps because Mata is pseudo-compiled)
> DF> it would be very attractive.
>
> The stricture does not apply to Mata. More correctly, I never
> recommend explicitly looping over observations if you can avoid
> it, and that applies to Mata, and that applies to language other
> than Stata and Mata, too, if the language provides an alternative
> method.
>
> In the case of Mata, it is faster than Stata, and explicitly looping over
> the observations often produces acceptable performance.
>
> If you were going to use Mata and explictly loop over observations,
> I would recommend against using views.
>
> In this case, however, I can think of a way to write the procedure
> without looping over the data:
>
> 1. Put the data in year order, so all 1973 are together, all 1974
> are together, etc. Do that in Stata.
>
> 2. In Mata, construct a view onto the data.
>
> 3. Use function [M-5] panelsetup() to obtain the beginning and
> ending indices of each year.
>
> 4. For each value of year,
>
> a. Extract from view matrix submatrix for the year using
> range subscripts [|#,# \ #,#|]; see [M-2] subscripts.
> Store the result in a regular matrix.
>
> b. Pass said matrix to the year-specific Mata subroutine you
> write to make the calculation.
>
> c. In the year-specific subroutine, do not loop through the
> observations; instead use the appropriate colon operators;
> see [M-2] op_colon.
>
> 5. Now slam in one swoop the newly replaced values of variables
> back into the View using the same range subscripts [|#,#\#,#|]
> you used when extracting the the submatrix. This time, the
> range subscripts will appear to the left of the equal-sign
> assignment operator.
>
> There are other approaches you could use, but what I outlined would
> be very fast.
>
> All of that said, you may very well get adequate performance using Mata
> and looping over the observations. It is not that what I just suggested
> would take longer to code than the explicit looping solution, it is merely
> that it assumes more familiarity with Mata and its advanced features.
> When breaking into Mata for the first time, it is usually best to stay
> with approaches with which you are familiar. One of the good features
> about Stata is that those approaches usually work well.
>
>
> -- Bill
> [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/