I don't know about "the key". Early in the thread
I offered a solution that is shorter and simpler
than any based on -merge-. It might be slower, but
I've yet to hear a report on that.
Here is the code again.
gen gpa_f = .
qui forval i = 1/`=_N' {
/// next line may wrap
su gpa if inlist(id,`=friend1[`i']', `=friend2[`i']', `=friend3[`i']', `=friend4[`i']') , meanonly
replace gpa_f = r(mean) in `i'
}
Nick
[email protected]
Chris Ruebeck
> Thanks! I see the key is using -rename- and -merge- .
Stas Kolenikov
> > I would go with a merge, something like
> >
> > tempfile friend1 friend2 friend3 friend4
> >
> > preserve
> > keep id gpa
> > rename id friend
> > forvalues k=1/4 {
> > rename friend friend`k'
> > rename gpa gpa_f`k'
> > // note that the mask friend will be matched to friend1 when
> > k==2, etc.
> > sort id`k'
> > save `friend`k''
> > }
> > restore
> >
> > forvalues k=1/4 {
> > sort friend`k'
> > merge friend`k' using `friend`k''
> > }
> >
> > egen peer_gpa = rmean(gpa_f*)
> >
> > Of course I have not tried it working, but it should give
> you an idea.
> > I don't know if it is going to be much faster (and it very
> well might
> > be), but it is also somewhat clearer, I think.
> >
> > On 8/30/06, Chris Ruebeck <[email protected]> wrote:
> >> (Previously sent but didn't see it appear on Statalist.)
> >>
> >> Suppose my data set has these 6 variables,
> >>
> >> id : this respondent's ID,
> >> gpa : this respondent's GPA, and
> >> friend1-4 : the IDs (possibly missing) of this
> >> respondent's friends.
> >>
> >> I would like to create four new variables that record the
> GPA of each
> >> respondent's friends, and then take their average. I have many
> >> observations and want to avoid slower methods. Here is my code for
> >> the first friend.
> >>
> >> gen gpaf1 = .
> >> egen group = group(friend1)
> >> summarize group, meanonly
> >> foreach num 1 / `r(max)' {
> >> summarize id if group==`num', meanonly
> >> local idf = r(mean)
> >> summarize gpa if id==`idf', meanonly
> >> replace gpaf1 = r(mean) if group==`num'
> >> }
> >>
> >> I figure I can nest this in a forvalues loop from 1-4, and
> then use -
> >> egen ... rowmean(gpaf1-4)- to get the mean over friends.
> In the code
> >> above, levelsof could replace the -egen ...
> group(friend1)- but macro
> >> length limits would require splitting the friends' ids into two to
> >> four groups.
> >>
> >> Is there a faster method, perhaps with Mata?
> >>
> >> (An additional wrinkle: some friends may no longer be in the
> >> database---so an observation's friend1, for example, may contain a
> >> number that is not the id of any observation. I think the
> code above
> >> is robust to that problem, but perhaps this is another potential
> >> speed improvement.)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/