Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Converting a SAS datastep to Stata
From
"Joseph Coveney" <[email protected]>
To
<[email protected]>
Subject
Re: st: Converting a SAS datastep to Stata
Date
Thu, 16 Dec 2010 08:38:39 +0900
Daniel Feenberg wrote:
[snip]
Here is the SAS code for capital gains under the alternative minimum tax
for a single year:
if FLPDYR eq 2003 then do;
_amt5pc = min(c24533,min(c24532,min(c62700,c24517)));
_amt5pc = max(0,_amt5pc);
c62747 = .05*_amt5pc;
_line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc);
_line50 = sum(e24583,0);
_amt8pc = min(_line49,_line50);
c62749 = .08*_amt8pc;
_amt10pc = _line49 - _amt8pc;
c62750 = .1*_amt10pc;
_line55 = c24533 - _amt5pc;
_line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700));
_amt15pc = min(_line55,_line56);
c62755 = .15*_amt15pc;
_amt20pc = _line56 - _amt15pc;
c62760 = .2*_amt20pc;
_amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517);
c62770 = .25*_amt25pc;
_tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770;
end;
[snip]
Repeating the if qualifier means repeating a calculation, which is an
inefficiency, but it also means repeating the code, which is ugly and
distracting. That is why I asked about the possibility of a block level if
qualifier. If it doesn't exist, I'll put it in W Gould's suggestion box.
One thing I could do is allow more complex assignment statements, with
fewer of the intermediate values that are used to clarify purpose and show
the correspondence to the tax form. That could reduce the number of
statements by half but is otherwise undesirable.
--------------------------------------------------------------------------------
I'll second Austin's suggestion to move this to Mata. This will be trivial in
Mata, using its ability to create updatable views and subviews onto the dataset.
With SAS, the DATA step doesn't have all of the data in memory; it scrolls
through the input file one logical record at a time, places its contents into
the program data vector, creates the variables or replaces values in them (I
can't tell which you're doing from the excerpt), saves the record to an
output file, and proceeds to the next logical record in the input file. (Thus
the perennial concern among SAS users about I/O.) The IF block checks the
conditions upon reading in the logical record. If the IF condition isn't met,
the DATA step goes to the next logical record in the input file without
creating/changing the data.
In Stata, in contrast, you have all of the data in memory, and
creating/replacing data is "vectorized", and so you'll not get an IF-block
style concept _in Stata_.* This is just a consequence of the different data
models between SAS and Stata.
But, Mata has the ability to select blocks of observations (and variables) in
the Stata dataset and work on the block in isolation in situ. Mata's views
and subviews give you the very "block-level if qualifier" that you're seeking.
Joseph Coveney
* Absent -use if FLPDYR == 2003 using ALLRETURNS-, -generate/replace . . .-
and -save FLPDYR2003- in forced direct analogy with the SAS idiom.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/