Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: Create Variable of Groupings

From   Nick Cox <>
To   "" <>
Subject   Re: st: Re: Create Variable of Groupings
Date   Sat, 2 Nov 2013 12:01:47 +0000

Joseph gives excellent advice.

I add two footnotes.

The reason that applying -egen-'s -group()- function to -id3- did not
yield a "sequential" variable as you want it is that -id3- is a string
variable. If so, then using a simpler example than those here, which
is enough to make the point, "12", "23", "123" will necessarily sort
to "12", "123", "23". Otherwise put, Stata is doing exactly what your
syntax implies that you are asking, to sort a string variable and
assign identifiers in alphanumeric order. The result is sequential
given what is supplied.

The generic FAQ

might also be of use or interest.


On 2 November 2013 08:38, Joseph Coveney <> wrote:
> Lisa Wang wrote:
> I want to create a new variable that groups my observations so I can
> do something like a panel analysis.
> I have variables: identifiers date amount id3. id3 is a concatenation
> of identifiers and date.
> For instance,
> identifiers | date | amount | id3
> 1007 | 17aug2006 | 10 | 1007 17030
> 1007 | 17aug2006 | 7 | 1007 17030
> 1007 | 17aug2006 | 8.5 | 1007 17030
> 2049 | 26may2009 | 10 | 2049 18043
> 2049 | 26may2009| 7 | 2049 18043
> 2049 | 12mar2007 | 7 | 2049 17237
> 2049 | 12mar2007 | 7 | 2049 17237
> 2049 |12mar2007 | 7 | 2049 17237
> I would like it to output event_id = 1 for 1007 17030, 2 for 2049
> 18043, 3 for 2049 17237 etc etc....down the page.
> But at this point it seems to give me 2681 for 1007 17030, 5130 for
> 2049 18043 (ie. it is not sequential).
> I tried this:
> - bysort id* date : gen event_id = _n - but that gives me numbering
> WITHIN groups
> and also tried:
> - egen event_id = group(id3) - but it was not sequential. Do you think
> I need to so a by or sort beforehand?
> Thank you in advance for all your helpful suggestions as I am
> currently stuck and can't proceed.
> --------------------------------------------------------------------------------
> See the line of code below, starting at "Begin here".
> Joseph Coveney
> . input long identifiers str9 date double amount str1 id3
>       identifiers       date      amount        id3
>   1. 1007 17aug2006 10 1007 17030
>   2. 1007 17aug2006 7 1007 17030
>   3. 1007 17aug2006 8.5 1007 17030
>   4. 2049 26may2009 10 2049 18043
>   5. 2049 26may2009 7 2049 18043
>   6. 2049 12mar2007 7 2049 17237
>   7. 2049 12mar2007 7 2049 17237
>   8. 2049 12mar2007 7 2049 17237
>   9. end
> . quietly replace id3 = string(identifiers) + ///
>>     " " + string(date(date, "DMY"))
> .
> . *
> . * Begin here
> . *
> . generate byte event_id = sum(id3 != id3[_n-1])
> .
> . list, noobs sepby(event_id)
>   +-------------------------------------------------------+
>   | identi~s        date   amount          id3   event_id |
>   |-------------------------------------------------------|
>   |     1007   17aug2006       10   1007 17030          1 |
>   |     1007   17aug2006        7   1007 17030          1 |
>   |     1007   17aug2006      8.5   1007 17030          1 |
>   |-------------------------------------------------------|
>   |     2049   26may2009       10   2049 18043          2 |
>   |     2049   26may2009        7   2049 18043          2 |
>   |-------------------------------------------------------|
>   |     2049   12mar2007        7   2049 17237          3 |
>   |     2049   12mar2007        7   2049 17237          3 |
>   |     2049   12mar2007        7   2049 17237          3 |
>   +-------------------------------------------------------+
> .
> . exit
> end of do-file
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index