Title | Implementing SAS-like ARRAYs in Stata | |
Author | William Gould, StataCorp |
SAS provides an ARRAY facility, and whether Stata provides an analogy is a popular question on both our help line and Statalist. There is an analogy, but it is going to take some explaining.
First, let us agree on a problem: I have a list of variables—say, mpg, weight, and displ—and I want to do something to each of them. Just to fix ideas, let's pretend that I want to add 1 to each. Thus one solution is
. replace mpg = mpg + 1 . replace weight = weight + 1 . replace displ = displ + 1
That would not be a bad solution if I really did have three variables, but I am using three as an example, and I want you to pretend that I had 100 variables.
If I really wanted to add 1 to each of these variables, I could use foreach:
. foreach var of varlist mpg weight displ { replace `var' = `var' + 1 }
foreach has a pretty powerful syntax so, using some other dataset, I could compactly refer to my 100 variables:
. foreach var of varlist x1-x20 pop* d57 { replace `var' = `var' + 1 }
For this example, foreach seems most appropriate, but sometimes a while loop is best. Inside a program I might have the following code:
while "`1'" != "" { replace `1' = `1' + 1 macro shift }
In the above, `1' stands for the variable, and I can refer to it as often as I want just as I did with `var' in the foreach loops above. In my example, I refer to `1' twice: replace `1' = `1' + 1, meaning add 1 to the variable, but that is just my example, and really the replace statement stands for a block of code that does something complicated to the variable.
There are other ways I could code the while loop, such as
local i = 1 while "``i''" != "" { replace ``i'' = ``i'' + 1 local i = `i' + 1 }
This second method avoids macro shift and is faster.
Whichever way I write it, I need to somehow get Stata to understand that my list is “mpg weight displ”. Here is a complete, working program that you may find useful:
--------------------- BEGIN --- array.ado --- CUT HERE --- program array version 9.0 gettoken usrprog 0 : 0 syntax varlist foreach var of local varlist { `usrprog' `var' } end ---------------------- END --- array.ado --- CUT HERE ---
Using the utility, I could solve my problem with
. program add1 1. replace `1' = `1' + 1 2. end
or, alternatively, with
. program add1 1. args var 2. replace `var' = `var' + 1 3. end
and then
. array add1 mpg weight displ
To use array, I type array, followed by the name of a program to do something to one variable, followed by a list of variables on which I want the program run. Using my other more-than-100 variable dataset, I could type
. array add1 x1-x20 pop* d57
There are two steps to using array:
You do not have to be dependent on array. Writing your own, custom program is pretty easy. To solve my add-1-to-mpg-weight-and-displ problem, I could write
. foreach var of varlist mpg weight displ { . replace `var' = `var' + 1 . }
Or, less elegantly as
. tokenize mpg weight displ . while "`1'" != "" { . replace `1' = `1' + 1 . macro shift . }
These are really much more SAS-like solutions. I write a custom program, not a general one.
This last program I want you to understand thoroughly. First, let me give you some background on Stata.
That is how macros work. Now let us look at our less elegant program again and understand it:
(1) . tokenize mpg weight displ (2) . while "`1'" != "" { (3) . replace `1' = `1' + 1 (4) . macro shift (5) . }
There are other ways I could write program soln, and here is one that uses while but avoids using macro shift:
(1) . local array "mpg weight displ" (2) . local i = 1 (3) . local n : word count `array' (4) . while `i' <= `n' { (5) . local var : word `i' of `array' (6) . replace `var' = `var' + 1 (7) . local i = `i' + 1 (8) . }
The only new thing here is my use of word `i' of `array' in line 5, and you can probably guess what it does. Make the substitutions. The first time through the loop, line 5 reads
local var : word 1 of mpg weight displ
because `i' is 1 and `array' is "mpg weight displ" (sans quotes). Word 1 of "mpg weight displ" is mpg, and so mpg is stored in the macro var.
This second solution is a little longer than the previous one, but it has the advantage that I can generalize it to work with paired arrays. For example,
. local array1 "mpg weight displ" . local array2 "rep78 hdroom trunk" . local i = 1 . local n : word count `array1' . while `i' <= `n' { . local var1 : word `i' of `array1' . local var2 : word `i' of `array2' . replace `var1' = `var1' + `var2' . local i = `i' + 1 . }
Let the macro named array contain a list of variable names. For instance,
local array "mpg weight displ foreign"
The extended macro function word of will pull the ith word from the array. For instance, let macro i contain one of integers 1, 2, 3, or 4. Then
local x : word `i' of `array'
places the `i'th word of `array' into the macro named x. If i contains 3,
local i = 3
and then
local x : word `i' of `array'
places "displ" in x. In subsequent code, you can use `x' to refer to displ.
You can refer to multiple "arrays" simultaneously:
local array1 "mpg weight displ" local array2 "foreign length turn make" ... local i = 1 ... local j = 3 ... local x : word `i' of `array1' local y : word `j' of `array2' ... ... `x' ... `y'
In the above, referring to `x' and `y' is equivalent to referring to the selected variable names, and you may use `x' and `y' in any way that you would use a variable name. For example, since Stata variables can be explicitly subscripted—because turn[3] refers to the 3rd observation on variable turn—you can type `y'[3] to refer to the 3rd observation of the `j'th element of array2.
You can form matrices of variable names should that be desirable. Here is a 4 x 3 example:
local arow1 "mpg weight displ" local arow2 "turn gratio foreign" local arow3 "rep78 hdroom trunk" local arow4 "length price make" ... /* the following obtains a[3,2], namely hdroom: */ local x : word 2 of `arow3' ... /* the following obtains a[`i',`j']: */ local x : word `j' of `arow`i''
To summarize, to define an M-element array vector named array, type
local array "varname1 varname2 ... varnameM"
To refer to array[i], type
local x : word `i' of `array'
and then refer to `x'.
To define an N x M array matrix named matrix, type
local matrix1 "varname11 varname12 ... varname1M" local matrix2 "varname21 varname22 ... varname2M" ... local matrixN "varnameN1 varnameN2 ... varnameNM"
To refer to matrix[i,j], type
local x : word `j' of `matrix`i''
and then refer to `x'.