Title | Stata 5: A little flavor of the new reshape command | |
Author | William Gould, StataCorp |
First of all, you can obtain this new version of reshape by updating the Stata ado-files.
Once you have obtained and installed the latest ado-file update, you can type
. help reshape
to obtain the full documentation. Here are the highlights:
You can view the data as a collection of observations Xij. One such collection might be
(wide form) (long form) -i- ------- Xij -------- -i- -j- -Xij- id sex inc80 inc81 inc82 id year sex inc ------------------------------- ---------------------- 1 0 5000 5500 6000 1 80 0 5000 2 1 2000 2200 3300 1 81 0 5500 3 0 3000 2000 1000 1 82 0 6000 2 80 1 2000 2 81 1 2200 2 82 1 3300 3 80 0 3000 3 81 0 2000 3 82 0 1000
Using the new reshape, you can convert one form to the other by typing
. reshape long inc, i(id) j(year) (goes from left-form to right) . reshape wide inc, i(id) j(year) (goes from right-form to left)
In this example, one observation is, at least logically speaking,
+-------- in the wide form -------+ +------ in the long ------+ | . list if id==1 | | . list if id==1 | | | | | | id sex inc80 inc81 inc82 | OR | id sex year inc | | 1. 1 0 5000 5500 6000 | | 1. 1 0 80 5000 | + --------------------------------+ | 2. 1 0 81 5500 | | 3. 1 0 82 6000 | + ------------------------+
and you want to think of this single “observation” as Xij.
The i variable denotes the logical observation and is often called the group identifier. In our data, i is the variable id.
j denotes the subobservation, so it is often called the subgroup or within-group identifier. j is year in our data, or at least, variable year when the data are in the long form. There is no j variable in the wide form. Instead, the inc variable is suffixed with the values of j, forming inc80, inc81, and inc82.
That leaves only the variable sex, which we did not specify when we typed
. reshape long inc, i(id) j(year) . reshape wide inc, i(id) j(year)
Since sex was not specified, sex was assumed to be constant within i, and reshape verified this assumption before converting the data. There is no limit to the number of constant-within-i variables, and you do not have to explictly specify them. reshape now assumes the unmentioned variables are constant and notifies you if this is incorrect.
The syntax of reshape is
reshape {wide|long} X_ij-variables, i(i-variable) j(j-variable)
Here is an example with two Xij variables with the data in wide form:
. list id sex inc80 inc81 inc82 ue80 ue81 ue82 1. 1 0 5000 5500 6000 0 1 0 2. 2 1 2000 2200 3300 1 0 0 3. 3 0 3000 2000 1000 0 0 1
To convert these data into long form, type
. reshape long inc ue, i(id) j(year)
Note that there is no variable named year in our original wide dataset. year will be a new variable in our long dataset. After conversion, we will have
. list id year sex inc ue 1. 1 80 0 5000 0 2. 1 81 0 5500 1 3. 1 82 0 6000 0 <output omitted> 9. 3 82 0 1000 1
Similarly, if we took this dataset and typed
. reshape wide inc ue, i(id) j(year)
we would be back to our original data:
. list id sex inc80 inc81 inc82 ue80 ue81 ue82 1. 1 0 5000 5500 6000 0 1 0 2. 2 1 2000 2200 3300 1 0 0 3. 3 0 3000 2000 1000 0 0 1
Converting from wide to long creates the j (year) variable.
Converting from long to wide drops the j (year) variable.