Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Converting a SAS datastep to Stata
From
Scott Merryman <[email protected]>
To
[email protected]
Subject
Re: st: Converting a SAS datastep to Stata
Date
Mon, 13 Dec 2010 20:01:57 -0600
Take a look at -foreach- and -forvalues- (and -levelsof- it is also
very useful in these types of problems).
One way would be to do something like:
local yr = 1993
gen exemption = .
qui {
foreach exemp in 2350 2450 2500 2550 2650 2700 ///
2750 2800 2900 3000 3050 3100 3200 3300 3400 3500 {
replace exemption = `exemp' if flpdyr == `yr'
local yr = `yr' + 1
}
}
Scott
On Mon, Dec 13, 2010 at 6:51 PM, Daniel Feenberg <[email protected]> wrote:
> I have done programs to calculate income tax liability in SAS and fortran.
> Both those languages allow tax parameters that vary across years and filing
> status to be held in initialized arrays. For example, in SAS one could
> declare:
>
> array exmp(1993:2010) _temporary_;
> retain exmp 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100
> 3200 3300 3400 3500;
>
> and then assigning the correct value of the personal exemption to every
> individual record is just:
>
> exemption = exmp(fldpyr);
>
> where flpdyr is a variable in the data with the filing year. I am at a bit
> of a loss as to how to do this in Stata. I don't like:
>
> gen exemption = (flpdyr==1993)*2350 + (flpdyr==1994)*2450...(for 18
> subexpressions in all)
>
> or
>
> gen exemption = 2350, if flpdyr==1993
> replace exemption = 2450, if flpdyr==1994
> ...(for 18 lines in all)...
>
> because these require (and execute) so much repetitive code.
>
> Another possibility is to create a dataset of parameters by year and filing
> status, then sort the tax return data by year and filing status, and finally
> merge the parameters onto the tax return data. But that requires a sort and
> a lot of I/O, which could be slow with potentially millions of returns. The
> additional memory required is probably not a big issue.
>
> I don't actually know Mata, but I think I could define a rowvector:
>
> exmp = ( 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100
> 3200 3300 3400 3500);
>
> and then loop over all the tax returns executing:
>
> exemption[i] = exmp[flpdyr[i]-1992];
>
> for each return (where i indexes returns). That seems to mean that every
> variable is going to have to carry around a [i] subscript and there will be
> a 1,000 lines of Mata code executed for each return (rather than the
> preferred 1,000 lines of code for all the returns together). That is much
> less attractive than leaving the observation number implicit, as the regular
> Stata language does. Brief study of [M-2]subscripts doesn't suggest any
> "matrixy" way of coding this.
>
> I expect I am missing something obvious, can someone point me in the right
> direction?
>
> Thanks
>
> Daniel Feenberg
> NBER
> Cambridge MA
> [email protected]
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/