Recently Terra Curtis has been posting problems to
do with a particular dataset. Last Thursday I repeated
a comment I made earlier:
====================== 24 October 2005
Also I guess wildly that your data structure is quite
the wrong way round to do much easily in Stata. Solve
this problem, and other problems will be just as messy.
Consider reshaping to panel data structure, and much
may be a lot easier.
======================
I am now going to expand on this guess.
Terra works for an outfit called Cambridge Finance.
It looks as if the data in question are organised
like this:
The observations are companies, or something similar
of interest to people in Cambridge Finance.
The variables are various properties for 58 quarters,
from the first quarter of 1989 through the last of 2005.
Thus something called "lc" is represented by 58
variables and something called "dolq" by another 58
variables. The last problem Terra posted was to
calculate the ratio of lc and dolq, and so
another 58 variables were needed.
There may well be much, much more, each property
being represented by another 58 variables. And
every time Terra wants some other derived property
another 58 variables will be needed.
Digging the same hole deeper like this can only make life
more painful, even if Stata's limit on # of variables
never bites.
The remedy here is to -reshape-. [D] reshape
gives a detailed account. Yet more details
and examples are in the FAQ
http://www.stata.com/support/faqs/data/reshape3.html
In fact, what I am going to suggest is basically
just copied from that FAQ.
We don't need to know more gory details to
suggest a strategy. Suppose also that there is
some identifier variable, here called -id-, so
that -ds- will show something like
. ds
lcq1y1989 dolq1y1991 lcq2y1993 dolq2y1995 lcq3y1997 dolq3y2001 lcq4y2003
dolq1y1989 lcq2y1991 dolq2y1993 lcq3y1995 dolq3y1997 lcq4y2001 dolq4y2003
lcq2y1989 dolq2y1991 lcq3y1993 dolq3y1995 lcq4y1997 dolq4y2001 lcq1y2004
dolq2y1989 lcq3y1991 dolq3y1993 lcq4y1995 dolq4y1997 lcq1y2002 dolq1y2004
lcq3y1989 dolq3y1991 lcq4y1993 dolq4y1995 lcq1y1998 dolq1y2002 lcq2y2004
dolq3y1989 lcq4y1991 dolq4y1993 lcq1y1996 dolq1y1998 lcq2y2002 dolq2y2004
lcq4y1989 dolq4y1991 lcq1y1994 dolq1y1996 lcq2y1998 dolq2y2002 lcq3y2004
dolq4y1989 lcq1y1992 dolq1y1994 lcq2y1996 dolq2y1998 lcq3y2002 dolq3y2004
lcq1y1990 dolq1y1992 lcq2y1994 dolq2y1996 lcq3y1999 dolq3y2002 lcq4y2004
dolq1y1990 lcq2y1992 dolq2y1994 lcq3y1996 dolq3y1999 lcq4y2002 dolq4y2004
lcq2y1990 dolq2y1992 lcq3y1994 dolq3y1996 lcq4y1999 dolq4y2002 lcq1y2005
dolq2y1990 lcq3y1992 dolq3y1994 lcq4y1996 dolq4y1999 lcq1y2003 dolq1y2005
lcq3y1990 dolq3y1992 lcq4y1994 dolq4y1996 lcq1y2000 dolq1y2003 lcq2y2005
dolq3y1990 lcq4y1992 dolq4y1994 lcq1y1997 dolq1y2000 lcq2y2003 dolq2y2005
lcq4y1990 dolq4y1992 lcq1y1995 dolq1y1997 lcq2y2001 dolq2y2003 id
dolq4y1990 lcq1y1993 dolq1y1995 lcq2y1997 dolq2y2001 lcq3y2003
lcq1y1991 dolq1y1993 lcq2y1995 dolq2y1997 lcq3y2001 dolq3y2003
Focus on one set of 58, say lc*
. ds lc*
lcq1y1989 lcq1y1991 lcq1y1993 lcq1y1995 lcq1y1997 lcq1y2000 lcq1y2003 lcq1y2005
lcq2y1989 lcq2y1991 lcq2y1993 lcq2y1995 lcq2y1997 lcq2y2001 lcq2y2003 lcq2y2005
lcq3y1989 lcq3y1991 lcq3y1993 lcq3y1995 lcq3y1997 lcq3y2001 lcq3y2003
lcq4y1989 lcq4y1991 lcq4y1993 lcq4y1995 lcq4y1997 lcq4y2001 lcq4y2003
lcq1y1990 lcq1y1992 lcq1y1994 lcq1y1996 lcq1y1998 lcq1y2002 lcq1y2004
lcq2y1990 lcq2y1992 lcq2y1994 lcq2y1996 lcq2y1998 lcq2y2002 lcq2y2004
lcq3y1990 lcq3y1992 lcq3y1994 lcq3y1996 lcq3y1999 lcq3y2002 lcq3y2004
lcq4y1990 lcq4y1992 lcq4y1994 lcq4y1996 lcq4y1999 lcq4y2002 lcq4y2004
-ds- leaves the list of names in its wake, so we can work on that to
get the suffixes, "q1y1989" on:
. local lc "`r(varlist)'"
. local suffix : subinstr local lc "lc" "", all
. di "`suffix'"
q1y1989 q2y1989 q3y1989 q4y1989 q1y1990 q2y1990 q3y1990 q4y1990 q1y1991 q2y1991 q3y1991 q4y19
> 91 q1y1992 q2y1992 q3y1992 q4y1992 q1y1993 q2y1993 q3y1993 q4y1993 q1y1994 q2y1994 q3y1994
> q4y1994 q1y1995 q2y1995 q3y1995 q4y1995 q1y1996 q2y1996 q3y1996 q4y1996 q1y1997 q2y1997 q3y
> 1997 q4y1997 q1y1998 q2y1998 q3y1999 q4y1999 q1y2000 q2y2001 q3y2001 q4y2001 q1y2002 q2y200
> 2 q3y2002 q4y2002 q1y2003 q2y2003 q3y2003 q4y2003 q1y2004 q2y2004 q3y2004 q4y2004 q1y2005 q
> 2y2005
Now we work the other way, to get the stubs:
. unab stub : *q1y1989
. di "`stub'"
lcq1y1989 dolq1y1989
In our case there are only two stubs, but the problem would be
just the same if there were several more:
. local stub : subinstr local stub "q1y1989" "", all
. di "`stub'"
lc dol
Now we have enough to do the -reshape-:
. reshape long `stub' , i(id) string
(note: j = q1y1989 q1y1990 q1y1991 q1y1992 q1y1993 q1y1994 q1y1995 q1y1996 q1y1997 q1y1998 q1
> y2000 q1y2002 q1y2003 q1y2004 q1y2005 q2y1989 q2y1990 q2y1991 q2y1992 q2y1993 q2y1994 q2y19
> 95 q2y1996 q2y1997 q2y1998 q2y2001 q2y2002 q2y2003 q2y2004 q2y2005 q3y1989 q3y1990 q3y1991
> q3y1992 q3y1993 q3y1994 q3y1995 q3y1996 q3y1997 q3y1999 q3y2001 q3y2002 q3y2003 q3y2004 q4y
> 1989 q4y1990 q4y1991 q4y1992 q4y1993 q4y1994 q4y1995 q4y1996 q4y1997 q4y1999 q4y2001 q4y200
> 2 q4y2003 q4y2004)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 10 -> 580
Number of variables 117 -> 4
j variable (58 values) -> _j
xij variables:
lcq1y1989 lcq1y1990 ... lcq4y2004 -> lc
dolq1y1989 dolq1y1990 ... dolq4y2004 -> dol
-----------------------------------------------------------------------------
The dates are not yet usable:
. levelsof _j, clean
q1y1989 q1y1990 q1y1991 q1y1992 q1y1993 q1y1994 q1y1995 q1y1996 q1y1997 q1y1998 q1y2000 q1y20
> 02 q1y2003 q1y2004 q1y2005 q2y1989 q2y1990 q2y1991 q2y1992 q2y1993 q2y1994 q2y1995 q2y1996
> q2y1997 q2y1998 q2y2001 q2y2002 q2y2003 q2y2004 q2y2005 q3y1989 q3y1990 q3y1991 q3y1992 q3y
> 1993 q3y1994 q3y1995 q3y1996 q3y1997 q3y1999 q3y2001 q3y2002 q3y2003 q3y2004 q4y1989 q4y199
> 0 q4y1991 q4y1992 q4y1993 q4y1994 q4y1995 q4y1996 q4y1997 q4y1999 q4y2001 q4y2002 q4y2003 q
> 4y2004
But we just extract the relevant bits, convert from characters to numbers, and we
then have a Stata date variable.
. gen time = yq(real(substr(_j,-4,4)), real(substr(_j,2,1)))
. levelsof time
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 1
> 39 140 141 142 143 144 145 146 147 148 149 150 151 152 153 158 159 160 165 166 167 168 169
> 170 171 172 173 174 175 176 177 178 179 180 181
. format time %tqCy_q
. l time in 1/10
+--------+
| time |
|--------|
1. | 1989 1 |
2. | 1990 1 |
3. | 1991 1 |
4. | 1992 1 |
5. | 1993 1 |
|--------|
6. | 1994 1 |
7. | 1995 1 |
8. | 1996 1 |
9. | 1997 1 |
10. | 1998 1 |
+--------+
. tsset id time
Now the ratio calculation is laughably simple:
. gen ratio = lc/dolq
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/