Bill,
Thank you very much! I haven't made the transition to mata yet, but
your solution is yet another reminder that this transition is long
overdue!
cheers,
sandu
On Tue, Nov 3, 2009 at 10:59 AM, William Gould, StataCorp LP
<[email protected]> wrote:
> Sandu Cojocaru <[email protected]> asked,
>
>> I'm having trouble generating a variable that for each member i equals
>> sum(Cj-Ci) over all Cj>Ci where i and j are members of the same group.
>> Here's an example of the data setup - I'm trying to calculate
>> `outcome_var'.
>> For row 1 outcome_var=0, for row 3 = (200-100)+(300-100) = 300...and so on...
>>
>> group_id member_id C outcome_var
>> 1 1 300 0
>> 1 2 200 100
>> 1 3 100 300
>> 2 1 150 50
>> 2 2 200 0
>> 2 3 100 150
>> 2 4 50 300
>> 3 1 and so on...
>
> This question has already been answered elegantly by Martin Weiss
> <[email protected]>. His answer was,
>
>> clear*
>>
>> input byte(group_id member_id) C
>> 1 1 300
>> 1 2 200
>> 1 3 100
>> 2 1 150
>> 2 2 200
>> 2 3 100
>> 2 4 50
>> end
>>
>> compress
>> list, noo sepby(group_id)
>>
>> bys group_id (C): /*
>> */ gen diff=C[_n+1]-C[_n]
>> bys group_id: gen num=_N-_n
>> bys group_id (num): /*
>> */ gen outcome_var=sum(diff*num)
>> sort group_id member_id
>>
>> drop diff num
>> list, noo sepby(group_id)
>
> I'm about to give a different answer. Sometimes one needs to create a
> variable that is a complicated combination of values in different
> observations. There is always a way to do it in Stata, but somtimes
> the solution is elusive and one wished one could just loop across
> the observations and make the calculation directly even if that solution
> was inefficient. I want to show how to do that using Mata.
> The basic recipe is
>
> 1. Enter Mata:
>
> . mata
>
> : _
>
>
> 2. Create individual Mata variables that are a view onto each of
> relevant Stata variables. In the above, the relevant Stata variables
> are group_id and and C, so create Mata variables of the same
> name:
>
> : st_view(group_id=., ., "group_id")
> : st_view(C, ., "C")
> : _
>
> 3. Go back to Stata and create the the desired new variable, filled
> with missing values. Create a view onto that, too. In this
> example, the new desired variable is outcome_var:
>
> : end
> . gen outcome_var = .
> . mata
> : st_view(outcome_var=., ., "outcome_var")
>
> 4. Loop in Mata to fill in the new variable.
>
> Before showing the solution to Sandu's problem, let me show how
> this works in an easier examples.
>
>
> An easy example
> ----------------
>
> We want to create new variable newx equal to x+1. We could do this
> in Stata by typing
>
> . gen newx = x + 1
>
> Alternatively, we could achieve the same result by typing,
>
> . gen newx = .
>
> . mata
>
> : st_view(x=., ., "x")
> : st_view(newx=., ., "newx")
>
> : for (i=1; i<=st_nobs(); i++) {
> : newx[i] = x[i] + 1
> : }
>
> : end
>
> Try it. The result after typing all that Mata code will be the same
> as -gen newx = x + 1-.
>
> Note the use of Mata function st_nobs() to obtain the number of
> observations in the dataset.
>
>
> Panel data (by)
> ---------------
>
> Panel data adds complication. Pretend we wanted to code the Mata
> equivalent to
>
> . by group: gen newx = x + 1
>
> I know the -by group:- prefix adds nothing to the statement, but
> at this point I want to keep the example simple.
>
> The equivalent Mata code is,
>
> . gen newx = .
>
> . mata
>
> : st_view(group=., ., "group")
> : st_view(x=., ., "x")
> : st_view(newx=., ., "newx")
>
> : obs = panelsetup(group, 1)
>
> : for (g=1; g<=rows(obs); g++) {
> : for (i=obs[g,1]; i<=obs[g,2]; g++) {
> : newx[i] = x[i] + 1
> : }
> : }
>
> : end
>
> In the above code, I assume the data are already sorted by group.
>
> Note the line
>
> : obs = panelsetup(group, 1)
>
> If we had two groups -- it wouldn't matter if they were numbered 1 and 2
> or 6*_pi and 9 -- and we had three observations in the first group
> and five in the second, matrix obs would contain
>
> 1 3
> 4 8
>
> The first row states the observation numbers corresponding the first
> group (1 to 3); the second grow states the observation numbers
> corresponding to the second (4 through 8). The matrix has two rows
> because there are two groups. The matrix always has 2 columns.
> See -help mata panelsetup()-.
>
> In the loop that follows, the outer loop (g) loops across the by
> groups. The inner loop (i) loops across the observations within
> the group.
>
>
> Putting it all together; the solution to Sandu's problem
> --------------------------------------------------------
>
> Here is the solution to Sandu's problem:
>
> . sort group_id
> . gen outcome_var = .
>
> : mata:
>
> : st_view(group_id=., ., "group_id")
> : st_view(C=., ., "C")
> : st_view(outcome_var=., ., "outcome_var")
>
> : obs = panelsetup(group_id, 1)
>
> : for (g=1; g<=rows(obs); g++) {
> : for (i=obs[g,1]; i<=obs[g,2]; i++) {
> : sum = 0
> : for (j=obs[g,1]; j<=obs[g,2]; j++) {
> : if (C[j]>C[i]) sum = sum + (C[j]-C[i])
> : }
> : outcome_var[i] = sum
> : }
> : }
>
> : end
>
> Note line the line
>
> if (C[j]>C[i]) sum = sum + (C[j]-C[i])
>
> That line is coded almost exactly as Sandu stated the problem:
> He requested the sum(Cj-Ci) over all Cj>Ci where i and j are members
> of the same group.
>
> In the code above, the outer loop (g) loops over group_id. The next
> loop (i) loops over the members of the group. The inner loop (j)
> also loops over the members of the group so that we obtain all
> combinations of i and j.
>
> Martin's solution executes more quickly than the above solution. I
> tried both solutions on 5,000 groups, each with 100 members. Martin's
> solution ran in 2.28 seconds. Mine took 35 seconds! That's not so
> much Mata's fault as mine. My solution is not cleaver; I performed
> the -if (C[j]>C[i]) sum = sum + (C[j]-C[i])- statement 50,000,000 times!
>
> So what? My solution was not clever and neither did it depend on me
> being clever. I wonder which one of us had a solution to this problem
> sooner? I just plugged into the recipe:
>
> 1. Enter Mata.
>
> 2. Create individual Mata variables that are a view onto each of
> relevant Stata variables.
>
> 3. Go back to Stata and create the the desired new variable, filled
> with missing values. Create a view onto that, too.
>
> 4. Loop in Mata to fill in the new variable.
>
> The only new code I wrote was for (4), and that read
>
> : for (g=1; g<=rows(obs); g++) {
> : for (i=obs[g,1]; i<=obs[g,2]; i++) {
> : sum = 0
> : for (j=obs[g,1]; j<=obs[g,2]; j++) {
> : if (C[j]>C[i]) sum = sum + (C[j]-C[i])
> : }
> : outcome_var[i] = sum
> : }
> : }
>
> -- Bill
> [email protected]
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/