Yumin Sheng asked [slight editing here, combining
two postings]
---------------------
I have been trying to save data from World Development
Indicators (WDI, online) into a cross-section
time-series format. The original data look like the
following
Country Variable Name 1960 1961 ... 2002
Afghanistan Aid X X ... X
Afghanistan Population X X ...X
... ... ... ... ... ...
... ... ... ... ... ...
Afghanistan Wage X X ... X
Albania Aid X X ... X
Albania Population X X ... X
... ... ... ... ... ...
... ... ... ... ... ...
Albania Wage X X ... X
The ideal format I would like to have would be:
Country Aid Population ... ... Wage
Afghanistan 1960 X X ... X
Afghanistan 1961 X X ... X
... ... ... ... ... ...
... ... ... ... ... ...
Afghanistan 2002 X X ... X
Albania 1960 X X ... X
Albania 1961 X X ... X
... ... ... ... ... ...
... ... ... ... ... ...
Albania 2002 X X ... X
... ... ... ... ... ...
China 1960 X X ... X
China 1961 X X ... X
... ... ... ... ... ...
... ... ... ... ... ...
China 2002 X X ... X
World Development Indicators have about 200 variables
for each country.
But (1) sometimes many words are contained in
one variable name such as "Export share in GDP, %"
and (2) some countries have data on more
variables than do other countries.
--------------------
Dev Vencappa recommended -reshape- and Toyoto Iwata
posted code for a example dataset. Toyoto's code
doesn't extend to coping with the first problem
above and I am not sure that it would cope with the
second problem above.
Here is a sketch of slightly more general code:
the dataset is just an example. (I guess that
there is a shorter and still general solution
but I can't find it.)
. l
+------------------------------------------+
| country variable v1960 v1961 |
|------------------------------------------|
1. | Afghanistan Aid 1 6 |
2. | Afghanistan Population 2 7 |
3. | Albania Aid 3 8 |
4. | Albania Fishing 4 9 |
5. | Albania Population 5 10 |
+------------------------------------------+
-reshape- to long and then rectangularise.
-fillin- takes care of problem (2).
. reshape long v , i(country var) j(year)
. fillin country var year
. drop _fillin
We know from problem (1) that the values in
-variable- aren't all legal variable names,
so we can't use them directly. One trick is
. egen group = group(variable)
That gives ways of counting variables and
years, so our code is more general:
. su group , meanonly
. local nvars = r(max)
. qui count if country == country[1] & group == group[1]
. local nyears = r(N)
Now we split the composite -v- into variables
and assign variable labels, picking them up
from -variable-:
. gen order = _n
. qui forval i = 1/`nvars' {
. gen v`i' = v if group == `i'
. su order if group == `i', meanonly
. label var v`i' `"`=variable[`r(min)']'"'
. }
. drop v variable order
. l
+-------------------------------------------+
| country year group v1 v2 v3 |
|-------------------------------------------|
1. | Afghanistan 1960 1 1 . . |
2. | Afghanistan 1961 1 6 . . |
3. | Afghanistan 1960 2 . . . |
4. | Afghanistan 1961 2 . . . |
5. | Afghanistan 1960 3 . . 2 |
|-------------------------------------------|
6. | Afghanistan 1961 3 . . 7 |
7. | Albania 1960 1 3 . . |
8. | Albania 1961 1 8 . . |
9. | Albania 1960 2 . 4 . |
10. | Albania 1961 2 . 9 . |
|-------------------------------------------|
11. | Albania 1960 3 . . 5 |
12. | Albania 1961 3 . . 10 |
+-------------------------------------------+
We need to pull values for most variables from
later in order:
. qui forval j = 2/`nvars' {
. replace v`j' = v`j'[_n+`nyears' * (`j' - 1)]
. }
. keep if group == 1
. drop group
. l
+-----------------------------------+
| country year v1 v2 v3 |
|-----------------------------------|
1. | Afghanistan 1960 1 . 2 |
2. | Afghanistan 1961 6 . 7 |
3. | Albania 1960 3 4 5 |
4. | Albania 1961 8 9 10 |
+-----------------------------------+
I also looked at a solution centred on -xpose-,
but the preparation and clean-up were not
easier than this.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/