Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: RE: Friends' characteristics


From   "Carter Rees" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: RE: RE: Friends' characteristics
Date   Thu, 31 Aug 2006 08:53:18 -0400

Nick & Chris,

Speed was a problem for me.  The issue I ran into with the -merge- solution
was a huge file after the long and wide -reshape-.  I had a lot of friend
variables that I wanted in the wide format so I could get some group
averages etc.  I found that after -reshape- long/-merge-, -collapse- became
a handy piece of code.

Carter Rees
School of Criminal Justice
University at Albany, SUNY


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, August 31, 2006 5:41 AM
To: [email protected]
Subject: st: RE: RE: RE: Friends' characteristics

Thanks for the confirmation. Two points remain: 

1. Each solution has to be extended to multiple files. 

2. Which is faster? 

Nick 
[email protected] 

Carter Rees
 
> A bit of code (with naming conventions changed from original) 
> provided to me
> previously by Maarten Buis later commented upon by Nick Cox.  
> My question
> was essentially the same as yours and a small bit of 
> modification of this
> code helped immensely.  The final format will allow you to 
> calculate a mean
> across your gpa variables.  
> 
> The original conversation can be found here:  
> http://www.stata.com/statalist/archive/2006-03/msg00612.html
> 
> So, Nick is correct in that there is a -merge- solution.
> 
> note:  this will create the gpa variables associated with 
> each nominated
> friend
> drop _all
> tempfile a
> input frnd gpa
>       99   2.5
>       88   3.1
>       77   4
>       66   1.8
>       55   3.6
>       44   2.9
> end
> sort frnd
> save test, replace
> 
> drop _all
> input aid frnd1 frnd2 frnd3 frnd4
>       99   66    77   .     .    
>       88   77    99   .     .    
>       77   55    44   99
>       66   88    99   44   77
>       55   44    .    .    .
>       44   66    .    .    .
> end
> 
> 
> reshape long frnd, i(aid)
> drop if frnd ==.
> sort frnd
> merge frnd using test
> drop if _merge == 2
> drop _merge
> reshape wide frnd gpa, i(aid) j(_j)
> list
> save test2, replace

Nick Cox
 
> There is probably a -merge- solution. 
> 
> In this case, at worst, a solution is a single 
> loop over observations. 
> 
> gen gpa_f = .
> 
> qui forval i = 1/`=_N' {
> 	/// next line may wrap 
> 	su gpa if inlist(id,`=friend1[`i']', `=friend2[`i']',
> `=friend3[`i']', `=friend4[`i']') , meanonly
> 	replace gpa_f = r(mean) in `i'
> }
> 
> If your ids are string, then you need instead 
> 
> inlist(id,"`=friend1[`i']'", "`=friend2[`i']'", "`=friend3[`i']'",
> "`=friend4[`i']'") 
 
Chris Ruebeck
  
> > Suppose my data set has these 6 variables,
> > 
> > 	id : this respondent's ID,
> > 	gpa : this respondent's GPA, and
> > 	friend1-4 : the IDs (possibly missing) of this 
> > respondent's friends.
> > 
> > I would like to create four new variables that record the GPA 
> > of each  
> > respondent's friends, and then take their average.  I have many  
> > observations and want to avoid slower methods.  Here is my 
> code for  
> > the first friend.
> > 
> > gen gpaf1 = .
> > egen group = group(friend1)
> > summarize group, meanonly
> > foreach num 1 / `r(max)' {
> > 	summarize id if group==`num', meanonly
> > 	local idf = r(mean)
> > 	summarize gpa if id==`idf', meanonly
> > 	replace gpaf1 = r(mean) if group==`num'
> > }
> > 
> > I figure I can nest this in a forvalues loop from 1-4, and 
> then use - 
> > egen ... rowmean(gpaf1-4)- to get the mean over friends.  In 
> > the code  
> > above, levelsof could replace the -egen ... group(friend1)- 
> > but macro  
> > length limits would require splitting the friends' ids into two to  
> > four groups.
> > 
> > Is there a faster method, perhaps with Mata?
> > 
> > (An additional wrinkle: some friends may no longer be in the  
> > database---so an observation's friend1, for example, may contain a  
> > number that is not the id of any observation.  I think the 
> > code above  
> > is robust to that problem, but perhaps this is another potential  
> > speed improvement.)

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index