Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Re: Create Variable of Groupings |
Date | Sat, 2 Nov 2013 12:01:47 +0000 |
Joseph gives excellent advice. I add two footnotes. The reason that applying -egen-'s -group()- function to -id3- did not yield a "sequential" variable as you want it is that -id3- is a string variable. If so, then using a simpler example than those here, which is enough to make the point, "12", "23", "123" will necessarily sort to "12", "123", "23". Otherwise put, Stata is doing exactly what your syntax implies that you are asking, to sort a string variable and assign identifiers in alphanumeric order. The result is sequential given what is supplied. The generic FAQ http://www.stata.com/support/faqs/data-management/creating-group-identifiers/ might also be of use or interest. Nick njcoxstata@gmail.com On 2 November 2013 08:38, Joseph Coveney <stajc2@gmail.com> wrote: > Lisa Wang wrote: > > I want to create a new variable that groups my observations so I can > do something like a panel analysis. > > I have variables: identifiers date amount id3. id3 is a concatenation > of identifiers and date. > > For instance, > > identifiers | date | amount | id3 > 1007 | 17aug2006 | 10 | 1007 17030 > 1007 | 17aug2006 | 7 | 1007 17030 > 1007 | 17aug2006 | 8.5 | 1007 17030 > 2049 | 26may2009 | 10 | 2049 18043 > 2049 | 26may2009| 7 | 2049 18043 > 2049 | 12mar2007 | 7 | 2049 17237 > 2049 | 12mar2007 | 7 | 2049 17237 > 2049 |12mar2007 | 7 | 2049 17237 > > I would like it to output event_id = 1 for 1007 17030, 2 for 2049 > 18043, 3 for 2049 17237 etc etc....down the page. > > But at this point it seems to give me 2681 for 1007 17030, 5130 for > 2049 18043 (ie. it is not sequential). > > I tried this: > - bysort id* date : gen event_id = _n - but that gives me numbering > WITHIN groups > and also tried: > - egen event_id = group(id3) - but it was not sequential. Do you think > I need to so a by or sort beforehand? > > > Thank you in advance for all your helpful suggestions as I am > currently stuck and can't proceed. > > -------------------------------------------------------------------------------- > > See the line of code below, starting at "Begin here". > > Joseph Coveney > > . input long identifiers str9 date double amount str1 id3 > > identifiers date amount id3 > 1. 1007 17aug2006 10 1007 17030 > 2. 1007 17aug2006 7 1007 17030 > 3. 1007 17aug2006 8.5 1007 17030 > 4. 2049 26may2009 10 2049 18043 > 5. 2049 26may2009 7 2049 18043 > 6. 2049 12mar2007 7 2049 17237 > 7. 2049 12mar2007 7 2049 17237 > 8. 2049 12mar2007 7 2049 17237 > 9. end > > . quietly replace id3 = string(identifiers) + /// >> " " + string(date(date, "DMY")) > > . > . * > . * Begin here > . * > . generate byte event_id = sum(id3 != id3[_n-1]) > > . > . list, noobs sepby(event_id) > > +-------------------------------------------------------+ > | identi~s date amount id3 event_id | > |-------------------------------------------------------| > | 1007 17aug2006 10 1007 17030 1 | > | 1007 17aug2006 7 1007 17030 1 | > | 1007 17aug2006 8.5 1007 17030 1 | > |-------------------------------------------------------| > | 2049 26may2009 10 2049 18043 2 | > | 2049 26may2009 7 2049 18043 2 | > |-------------------------------------------------------| > | 2049 12mar2007 7 2049 17237 3 | > | 2049 12mar2007 7 2049 17237 3 | > | 2049 12mar2007 7 2049 17237 3 | > +-------------------------------------------------------+ > > . > . exit > > end of do-file > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/