There is probably a -merge- solution.
In this case, at worst, a solution is a single
loop over observations.
gen gpa_f = .
qui forval i = 1/`=_N' {
/// next line may wrap
su gpa if inlist(id,`=friend1[`i']', `=friend2[`i']', `=friend3[`i']', `=friend4[`i']') , meanonly
replace gpa_f = r(mean) in `i'
}
If your ids are string, then you need instead
inlist(id,"`=friend1[`i']'", "`=friend2[`i']'", "`=friend3[`i']'", "`=friend4[`i']'")
Nick
[email protected]
Chris Ruebeck
> Suppose my data set has these 6 variables,
>
> id : this respondent's ID,
> gpa : this respondent's GPA, and
> friend1-4 : the IDs (possibly missing) of this
> respondent's friends.
>
> I would like to create four new variables that record the GPA
> of each
> respondent's friends, and then take their average. I have many
> observations and want to avoid slower methods. Here is my code for
> the first friend.
>
> gen gpaf1 = .
> egen group = group(friend1)
> summarize group, meanonly
> foreach num 1 / `r(max)' {
> summarize id if group==`num', meanonly
> local idf = r(mean)
> summarize gpa if id==`idf', meanonly
> replace gpaf1 = r(mean) if group==`num'
> }
>
> I figure I can nest this in a forvalues loop from 1-4, and then use -
> egen ... rowmean(gpaf1-4)- to get the mean over friends. In
> the code
> above, levelsof could replace the -egen ... group(friend1)-
> but macro
> length limits would require splitting the friends' ids into two to
> four groups.
>
> Is there a faster method, perhaps with Mata?
>
> (An additional wrinkle: some friends may no longer be in the
> database---so an observation's friend1, for example, may contain a
> number that is not the id of any observation. I think the
> code above
> is robust to that problem, but perhaps this is another potential
> speed improvement.)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/