Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Erik Aadland <erikaadland@hotmail.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RE: creating variable summarizing for each individual properties of other members of a group at t-1 |
Date | Tue, 17 May 2011 18:13:45 +0000 |
Thank you so much for your input, Nick. I have experimented and generated different variables previously relying on the very helpful FAQ in question. I am struggling with this problem however. When I apply the suggested code below, it appears that the calculation of the number of peers adds up more than it should for ind_ids with more ind_entry = 1 relative to other ind_ids, and consequently contribute more to the "score" than those with fewer ind_entry = 1. Referring to the example dataset, ind_id 2 is given the correct "prevscore". Ind_id 4, however, is not. By yearmonth 12, ind_id 4 has contributed 2 ind_entry=1 to the "score", which is correct for ind_id 2. However, ind_id 2 has not yet experienced ind_entry=1. Consequently, score - 1 for ind_id 4 yields a score = 1 in yearmonth 11 and 12, when the correct score = 0. And so on. Here is the suggested code as I applied it: clear ; #delimit ; use "ind_entry_ex.dta" ; sort yearmonth ; gen score = sum(ind_entry) ; by yearmonth: replace score = score[_N] ; replace score = score - ind_entry ; bysort ind_id (yearmonth): gen prevscore = score[_n-1] ; Here is the output: year month yearmonth ind_id ind_entry score prevscore 2003 10 10 2 0 1 2003 11 11 2 0 1 1 2003 12 12 2 0 2 1 2004 1 13 2 0 2 2 2004 2 14 2 1 2 2 2004 3 15 2 0 3 2 2003 10 10 4 1 0 2003 11 11 4 0 1 0 2003 12 12 4 1 1 1 2004 1 13 4 0 2 1 2004 2 14 4 0 3 2 2004 3 15 4 0 3 3 I use Stata 10. Thanks again and kind regards, Erik. ---------------------------------------- > From: n.j.cox@durham.ac.uk > To: statalist@hsphsun2.harvard.edu > Date: Tue, 17 May 2011 17:47:53 +0100 > Subject: st: RE: creating variable summarizing for each individual properties of other members of a group at t-1 > > I don't know what "I am familiar with" means here. Does it mean that you've read the FAQ but can't see how to apply it? > > This sounds to me like > > 1. Get the sum of all individual entries > > sort yearmonth > gen score = sum(ind_entry) > by yearmonth : replace score = score[_N] > > 2. Subtract this individual > > replace score = score - ind_entry > > 3. Look one step back in time > > bysort ind_id (yearmonth) : gen prevscore = score[_n-1] > > Nick > n.j.cox@durham.ac.uk > > Erik Aadland > > I need to create a variable that sums for each individual in my dataset the total number of ind_entry of all other individuals at time: yearmonth - 1. > I have attached a small ex of my data structure below. So for instance, given the small dataset below, for ind_id 2 in yearmonth 11 this variable score = 1. But for ind_id 4 in the same yearmonth, the score = 0. > > I would also like to generate a variable that identifies for each individual the unique number of other individuals in the dataset that have experienced ind_entry = 1 at least once up until time: yearmonth - 1. > > I am familiar with the following FAQ: http://www.stata.com/support/faqs/data/members.html > > My data structure is snapshot data in principle like the example below, but some individuals enter the observation window later than others (i.e. in later yearmonths): > > year month yearmonth ind_id ind_entry > 2003 10 10 2 0 > 2003 11 11 2 0 > 2003 12 12 2 0 > 2004 1 13 2 0 > 2004 2 14 2 1 > 2004 3 15 2 0 > 2003 10 10 4 1 > 2003 11 11 4 0 > 2003 12 12 4 1 > 2004 1 13 4 0 > 2004 2 14 4 0 > 2004 3 15 4 0 > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/