Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Joseph Coveney" <jcoveney@bigplanet.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Converting a SAS datastep to Stata |
Date | Thu, 16 Dec 2010 08:38:39 +0900 |
Daniel Feenberg wrote: [snip] Here is the SAS code for capital gains under the alternative minimum tax for a single year: if FLPDYR eq 2003 then do; _amt5pc = min(c24533,min(c24532,min(c62700,c24517))); _amt5pc = max(0,_amt5pc); c62747 = .05*_amt5pc; _line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc); _line50 = sum(e24583,0); _amt8pc = min(_line49,_line50); c62749 = .08*_amt8pc; _amt10pc = _line49 - _amt8pc; c62750 = .1*_amt10pc; _line55 = c24533 - _amt5pc; _line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700)); _amt15pc = min(_line55,_line56); c62755 = .15*_amt15pc; _amt20pc = _line56 - _amt15pc; c62760 = .2*_amt20pc; _amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517); c62770 = .25*_amt25pc; _tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770; end; [snip] Repeating the if qualifier means repeating a calculation, which is an inefficiency, but it also means repeating the code, which is ugly and distracting. That is why I asked about the possibility of a block level if qualifier. If it doesn't exist, I'll put it in W Gould's suggestion box. One thing I could do is allow more complex assignment statements, with fewer of the intermediate values that are used to clarify purpose and show the correspondence to the tax form. That could reduce the number of statements by half but is otherwise undesirable. -------------------------------------------------------------------------------- I'll second Austin's suggestion to move this to Mata. This will be trivial in Mata, using its ability to create updatable views and subviews onto the dataset. With SAS, the DATA step doesn't have all of the data in memory; it scrolls through the input file one logical record at a time, places its contents into the program data vector, creates the variables or replaces values in them (I can't tell which you're doing from the excerpt), saves the record to an output file, and proceeds to the next logical record in the input file. (Thus the perennial concern among SAS users about I/O.) The IF block checks the conditions upon reading in the logical record. If the IF condition isn't met, the DATA step goes to the next logical record in the input file without creating/changing the data. In Stata, in contrast, you have all of the data in memory, and creating/replacing data is "vectorized", and so you'll not get an IF-block style concept _in Stata_.* This is just a consequence of the different data models between SAS and Stata. But, Mata has the ability to select blocks of observations (and variables) in the Stata dataset and work on the block in isolation in situ. Mata's views and subviews give you the very "block-level if qualifier" that you're seeking. Joseph Coveney * Absent -use if FLPDYR == 2003 using ALLRETURNS-, -generate/replace . . .- and -save FLPDYR2003- in forced direct analogy with the SAS idiom. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/