Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Scott Merryman <scott.merryman@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Converting a SAS datastep to Stata |
Date | Mon, 13 Dec 2010 20:01:57 -0600 |
Take a look at -foreach- and -forvalues- (and -levelsof- it is also very useful in these types of problems). One way would be to do something like: local yr = 1993 gen exemption = . qui { foreach exemp in 2350 2450 2500 2550 2650 2700 /// 2750 2800 2900 3000 3050 3100 3200 3300 3400 3500 { replace exemption = `exemp' if flpdyr == `yr' local yr = `yr' + 1 } } Scott On Mon, Dec 13, 2010 at 6:51 PM, Daniel Feenberg <feenberg@nber.org> wrote: > I have done programs to calculate income tax liability in SAS and fortran. > Both those languages allow tax parameters that vary across years and filing > status to be held in initialized arrays. For example, in SAS one could > declare: > > array exmp(1993:2010) _temporary_; > retain exmp 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100 > 3200 3300 3400 3500; > > and then assigning the correct value of the personal exemption to every > individual record is just: > > exemption = exmp(fldpyr); > > where flpdyr is a variable in the data with the filing year. I am at a bit > of a loss as to how to do this in Stata. I don't like: > > gen exemption = (flpdyr==1993)*2350 + (flpdyr==1994)*2450...(for 18 > subexpressions in all) > > or > > gen exemption = 2350, if flpdyr==1993 > replace exemption = 2450, if flpdyr==1994 > ...(for 18 lines in all)... > > because these require (and execute) so much repetitive code. > > Another possibility is to create a dataset of parameters by year and filing > status, then sort the tax return data by year and filing status, and finally > merge the parameters onto the tax return data. But that requires a sort and > a lot of I/O, which could be slow with potentially millions of returns. The > additional memory required is probably not a big issue. > > I don't actually know Mata, but I think I could define a rowvector: > > exmp = ( 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100 > 3200 3300 3400 3500); > > and then loop over all the tax returns executing: > > exemption[i] = exmp[flpdyr[i]-1992]; > > for each return (where i indexes returns). That seems to mean that every > variable is going to have to carry around a [i] subscript and there will be > a 1,000 lines of Mata code executed for each return (rather than the > preferred 1,000 lines of code for all the returns together). That is much > less attractive than leaving the observation number implicit, as the regular > Stata language does. Brief study of [M-2]subscripts doesn't suggest any > "matrixy" way of coding this. > > I expect I am missing something obvious, can someone point me in the right > direction? > > Thanks > > Daniel Feenberg > NBER > Cambridge MA > feenberg@nber.org > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/