Dear Statalisters,
I have Stata 10.1 IC and I try to create individual specific sums in a
large dataset. The problem is a bit complicated and I have to cycle
through all individuals and variables using the "in" qualifier. I am
curious if anyone has an idea how to solve this problem more efficiently.
Here is the problem:
The data are in wide format and look like
ID Agemonth Var_1 Var_2... ...Var_623 Var_624
1 532 2 2 14 14
2 345 7 7 14 Mis
3 236 3 3 Mis Mis
4 267 2 2 12 12
and so forth; there are about 50,000 observations. "Agemonth" indicates
the observation period which is individual specific: "1" means January of
the year after the person turned 14, "2" is February and so forth. That
means e.g. "ID" 1 was observed 532 months after the year he/she turned 14.
The index of the variables indicate the same time index. Thus, person 1
was observed from Var_1 until Var_532. Unfortunately, that does not mean
that Var_533 or even Var_623 is missing but it may have a value like in
the example above.
Var_# has a number of distinct values and I need to sum them up in each
case. If I had no invalid observations I could type
egen sum1 = anycount(Var_*), values(1)
However, then I count also invalid observations.
I ended up with looping through individuals (~50,000) and variables (624),
summing up one by one but I really doubt that this is the "best" solution
(and hope that it is not):
*******************
#d;
gen sum1 = 0;
sort ID;
gen index = _n;
qui sum index;
forvalues indis = `r(min)'/`r(max)' {;
di "`indis'";
forvalues f = 1/624 {;
if `f' <=Agemonth in `indis' {;
qui replace sum1 = sum1 + (Var_`f' == 1) in
`indis';
};
};
};
*******************
Another possibilty would be to have the data in long format - however,
since I have so many periods it takes a while to reshape the data, even in
portions. I tried that with a 10% sample and "reshape" took more than one
hour (maybe I have to ask for a better computer...).
Any help would be appreciated!
Thank you,
Johannes
----------------------
Johannes Geyer
Deutsches Institut f�r Wirtschaftsforschung (DIW Berlin)
German Institute for Economic Research
Department of Public Economics
DIW Berlin
Mohrenstra�e 58
10117 Berlin
Tel: +49-30-89789-258
[email protected] schrieb am 17/09/2008 16:36:14:
> Bill Gould has a Mata matters column out real soon now in Stata Journal
> 8(3) 2008 that should help here.
>
> Nick
> [email protected]
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Stas
> Kolenikov
> Sent: 17 September 2008 15:22
> To: [email protected]
> Subject: st: analogues of macros in Mata?
>
> Suppose I form say a filename using some string operations, something
> like
>
> myfilename = prefix + "-" + suffix
>
> and then want to save something along the lines of
>
> mata matsave {myfilename} Matrix1 Matrix2 Matrix3
>
> Is there any way to have Mata go into the contents of myfilename
> object there, rather than create the file "myfilename.mmat" in the
> current directory? Or may be explicit version
>
> mata matsave prefix+"-"+suffix Matrix1 Matrix2 Matrix3
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/