Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: Converting a SAS datastep to Stata
From
Kevin Geraghty <[email protected]>
To
[email protected]
Subject
Re: st: Re: Converting a SAS datastep to Stata
Date
Mon, 13 Dec 2010 20:33:40 -0800 (PST)
FYI, I tried this to satisfy my own curiosity; it works. Probably the most parsimonious approach.
assuming your dataset has a variable "year" defined, taking values from 1993 through 2008, and the values specified for "exmp" are in the correct ascending year order.
matrix input exmp=(2350, 2450, 2500, 2550, 2650, 2700, 2750, 2800, 2900, 3000, 3050, 3100, 3200, 3300, 3400, 3500)
gen int exemption = exmp[1,year-1992]
----- "Joseph Coveney" <[email protected]> wrote:
> Daniel Feenberg wrote:
>
> I have done programs to calculate income tax liability in SAS and
> fortran.
> Both those languages allow tax parameters that vary across years and
> filing status to be held in initialized arrays. For example, in SAS
> one
> could declare:
>
> array exmp(1993:2010) _temporary_;
> retain exmp 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050
> 3100
> 3200 3300 3400 3500;
>
> and then assigning the correct value of the personal exemption to
> every
> individual record is just:
>
> exemption = exmp(fldpyr);
>
> where flpdyr is a variable in the data with the filing year. I am at a
> bit
> of a loss as to how to do this in Stata. I don't like:
>
> gen exemption = (flpdyr==1993)*2350 + (flpdyr==1994)*2450...(for
> 18
> subexpressions in all)
>
> or
>
> gen exemption = 2350, if flpdyr==1993
> replace exemption = 2450, if flpdyr==1994
> ...(for 18 lines in all)...
>
> because these require (and execute) so much repetitive code.
>
> Another possibility is to create a dataset of parameters by year and
> filing status, then sort the tax return data by year and filing
> status,
> and finally merge the parameters onto the tax return data. But that
> requires a sort and a lot of I/O, which could be slow with potentially
>
> millions of returns. The additional memory required is probably not a
> big
> issue.
>
> I don't actually know Mata, but I think I could define a rowvector:
>
> exmp = ( 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050
> 3100
> 3200 3300 3400 3500);
>
> and then loop over all the tax returns executing:
>
> exemption[i] = exmp[flpdyr[i]-1992];
>
> for each return (where i indexes returns). That seems to mean that
> every
> variable is going to have to carry around a [i] subscript and there
> will
> be a 1,000 lines of Mata code executed for each return (rather than
> the
> preferred 1,000 lines of code for all the returns together). That is
> much
> less attractive than leaving the observation number implicit, as the
> regular Stata language does. Brief study of [M-2]subscripts doesn't
> suggest any "matrixy" way of coding this.
>
> I expect I am missing something obvious, can someone point me in the
> right
> direction?
>
> --------------------------------------------------------------------------------
>
> The number of years is limited and they're integers, so you could
> probably get
> away with value labels and a one-shot work-up (see below). This
> SAS-ish
> approach might be faster than any -merge- (with its implicit -sort-)
> when you
> have millions of observations in the tax-record dataset.
>
> I'd bet that becoming familiar with Mata's -asarray()- (think: Paul
> Dorfman)
> will be more gratifying in the long run.
>
> Joseph Coveney
>
> P.S. What does SAS do when you have more index values (18 years) than
> array
> values (16 exemptions)? Does it pad the last value out to the end of
> the array,
> or recycle à la R?
>
> version 11.1
>
> clear *
> set more off
> set obs 18
> generate int year = 1992 + _n
>
> *
> * Begin here
> *
> local value_label label define Exemptions
> local year 1993
> foreach exemption in 2350 2450 2500 2550 2650 ///
> 2700 2750 2800 2900 3000 3050 3100 3200 ///
> 3300 3400 3500 3550 3600 {
> local value_label `value_label' `year' "`exemption'"
> local ++year
> }
> `value_label'
> label values year Exemptions
> decode year, generate(exemption)
> _strip_labels year
> destring exemption, replace
> list, noobs abbreviate(20) separator(0)
> exit
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/