Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: creating variable summarizing for each individual properties of other members of a group at t-1
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: creating variable summarizing for each individual properties of other members of a group at t-1
Date
Thu, 19 May 2011 14:30:14 +0100
How about using -tsfill- first? That would solve some problems with other commands.
Nick
[email protected]
Erik Aadland
Thank you so much, Nick. This is very close.
I tried the code on a small new test dataset with an unbalanced observation, and it seems that if an ind_id has an entry=1 in a yearmonth in which other ind_ids do not occur, this entry=1 is not captured in the "peer entry" scores for those ind_ids that did not occur in that yearmonth in their later yearmonths. If you see below, ind_id 56 entry = 1 in yearmonth 13 is not reflected in the first subsequent yearmonth (22) for ind_id 2 and 4. Similarly, ind_entry=1 for ind_id 2 in yearmonth 22 is not reflected in the prevscore2 for ind_id 56, since ind_id 56 does not have a yearmonth 22 observation. Is there a smart way to account for the effects of this unbalanced observation?
Here is the new testdata with unbalanced observation and correct prevscore 2 to the right.
year month yearmonth ind_id ind_entry ind_score2 all_score2 prevscore2 CORRECT prevscore 2
2003 12 12 2 0 0 1
2004 10 22 2 1 1 1 1 2
2004 11 23 2 0 1 2 1 2
2004 12 24 2 1 1 1 2 2
2003 12 12 4 1 1 0
2004 10 22 4 0 1 1 0 1
2004 11 23 4 0 1 2 1 2
2004 12 24 4 0 1 1 2 2
2003 12 12 56 0 0 1
2004 1 13 56 1 1 0 1 1
2004 11 23 56 1 1 2 0 2
Sincerely,
Erik.
----------------------------------------
> From: [email protected]
> To: [email protected]
> Date: Thu, 19 May 2011 13:34:33 +0100
> Subject: RE: st: creating variable summarizing for each individual properties of other members of a group at t-1
>
> That's much clearer to me, or rather I now realise some stupid misunderstandings of your earlier posts.
>
> I've used different variable names. I now suggest this.
>
> bysort ind_id (yearmonth) : gen ind_score2 = min(sum(ind_entry), 1)
> egen all_score2 = total(ind_score2), by(yearmonth)
> replace all_score2 = all_score2 - ind_score2
> bysort ind_id (yearmonth) : gen prevscore2 = all_score2[_n-1]
>
> Notable that the code, which reproduces your CORRECT, is much simpler than earlier bad versions.
>
> Nick
> [email protected]
>
> Erik Aadland
>
> I will try to explain why the suggested code below does not solve my second problem and what it gets wrong. The suggested code below does not solve my second problem because the all_score for a given ind_id includes ind_entry = 1 contribution for that same ind_id. I need the variable to sum the number of peer entrants (sum of unique ind_ids excluding the focal ind_id) over yearmonths for each ind_id. Once an ind_d has experienced ind_entry=1, the ind_id is considered an entrant and subsequent ind_entries = 1 for that ind_id does not change that. Given the code below, an ind_id gets an all_score and prevscore that includes their own entry. It seems problematic to me to consider an ind_id to be a peer to him or herself.
>
> See the resulting output below. I have entered an additional column to the right indicating the correct prevscore for each ind_id.
>
> year month yearmonth ind_id ind_entry ind_score all_score prevscore CORRECT prevscore
> 2003 10 10 2 0 0 1
> 2003 11 11 2 0 0 1 1 1
> 2003 12 12 2 0 0 1 1 1
> 2004 1 13 2 0 0 1 1 1
> 2004 2 14 2 1 1 1 1 1
> 2004 3 15 2 0 0 2 1 1
> 2003 10 10 4 1 1 0
> 2003 11 11 4 0 0 1 0 0
> 2003 12 12 4 1 0 1 1 0
> 2004 1 13 4 0 0 1 1 0
> 2004 2 14 4 0 0 2 1 0
> 2004 3 15 4 0 0 2 2 1
>
> Sincerely,
>
> Erik Aadland.
>
>
>
> > From: [email protected]
> > To: [email protected]
> > Date: Wed, 18 May 2011 18:36:37 +0100
> > Subject: RE: st: creating variable summarizing for each individual properties of other members of a group at t-1
> >
> > My code
> >
> > bysort ind_id (yearmonth) : gen ind_score = sum(ind_entry)
> > by ind_id : replace ind_score = ind_score == 1 & ind_score[_n-1] != 1
> > sort yearmonth
> > gen all_score = sum(ind_score)
> > by yearmonth : replace all_score = all_score[_N]
> > replace all_score = all_score - ind_score
> > bysort ind_id (yearmonth) : gen prevscore = allscore[_n-1]
> >
> > was certainly intended to solve your second problem. I've not tested it. Are you saying it doesn't? And if it doesn't what does it get wrong?
> >
> > Nick
> > [email protected]
> >
> > Erik Aadland
> >
> > Thank you Nick and Jorge for your suggestions. They were very helpful, and I am very grateful.
> >
> > Jorge, your suggested code below worked perfectly for my "first" variable.
> >
> > I am still struggling with my "second" variable. In the "second" variable, I am trying to create a variable that for each ind_id counts the total number of other ind_ids, excluding the focal ind_id, in the dataset that have experienced ind_entry =1 at least once up until and including yearmonth -1. In other words, I am trying to create a variable that for each individual tracks the number of other entrants in the dataset up until and including yearmonth -1. I am trying to track ind_ids that have entered, not how many times they have entered.
> >
> > Any and all input on this problem would be very much appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/