Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Converting a SAS datastep to Stata
From
Daniel Feenberg <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Converting a SAS datastep to Stata
Date
Wed, 15 Dec 2010 16:33:37 -0500 (EST)
On Wed, 15 Dec 2010, Nick Cox wrote:
If I understand that SAS code correctly, and I've never used SAS in my
life, an equivalent would be
gen lvalue1 = expr1 if flpdyr > 1993 & flpdyr < 1998
gen lvalue2 = expr2 if flpdyr > 1993 & flpdyr < 1998
In fact
if flpdyr > 1993 & flpdyr < 1998
could be translated to
if inrange(flpdyr, 1994, 1997)
which isn't much shorter but is likely to match the way you think more
closely. (Here I am taking it from context that year variables take
integer values only.)
Yes, integers only, and the range statement is very clear, however
consider that there are 18 lines of code for calculating the tax on
capital gains income in 2003, then 15 lines used only for 2004, etc for 21
years. While I personally blame the congress for the frequent tax law
changes, that isn't relevant for this mailing list.
Here is the SAS code for capital gains under the alternative minimum tax
for a single year:
if FLPDYR eq 2003 then do;
_amt5pc = min(c24533,min(c24532,min(c62700,c24517)));
_amt5pc = max(0,_amt5pc);
c62747 = .05*_amt5pc;
_line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc);
_line50 = sum(e24583,0);
_amt8pc = min(_line49,_line50);
c62749 = .08*_amt8pc;
_amt10pc = _line49 - _amt8pc;
c62750 = .1*_amt10pc;
_line55 = c24533 - _amt5pc;
_line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700));
_amt15pc = min(_line55,_line56);
c62755 = .15*_amt15pc;
_amt20pc = _line56 - _amt15pc;
c62760 = .2*_amt20pc;
_amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517);
c62770 = .25*_amt25pc;
_tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770;
end;
[The purpose of the code is to tax different assets at different rates,
where the rates also depend on the taxpayer income, including capital
gains, and with non-symmetric treatment of losses]. I think this
translates into the following Stata code:
_amt5pc = min(c24533,min(c24532,min(c62700,c24517))) if FLPDYR == 2003
_amt5pc = max(0,_amt5pc) if FLPDYR == 2003
c62747 = .05*_amt5pc if FLPDYR == 2003
_line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc) if FLPDYR == 2003
_line50 = rowtotal(e24583,0) if FLPDYR == 2003
_amt8pc = min(_line49,_line50) if FLPDYR == 2003
c62749 = .08*_amt8pc if FLPDYR == 2003
_amt10pc = _line49 - _amt8pc if FLPDYR == 2003
c62750 = .1*_amt10pc if FLPDYR == 2003
_line55 = c24533 - _amt5pc if FLPDYR == 2003
_line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700)) if FLPDYR == 2003
_amt15pc = min(_line55,_line56) if FLPDYR == 2003
c62755 = .15*_amt15pc if FLPDYR == 2003
_amt20pc = _line56 - _amt15pc if FLPDYR == 2003
c62760 = .2*_amt20pc if FLPDYR == 2003
_amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517) if FLPDYR == 2003
c62770 = .25*_amt25pc if FLPDYR == 2003
_tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770 if FLPDYR == 2003
Repeating the if qualifier means repeating a calculation, which is an
inefficiency, but it also means repeating the code, which is ugly and
distracting. That is why I asked about the possibility of a block level if
qualifier. If it doesn't exist, I'll put it in W Gould's suggestion box.
One thing I could do is allow more complex assignment statements, with
fewer of the intermediate values that are used to clarify purpose and show
the correspondence to the tax form. That could reduce the number of
statements by half but is otherwise undesirable.
Daniel Feenberg
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/