Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Identifying and recording the first occurrence of an event by actor given category
From
Erik Aadland <[email protected]>
To
<[email protected]>
Subject
RE: st: Identifying and recording the first occurrence of an event by actor given category
Date
Fri, 7 Sep 2012 11:36:10 +0000
Thank you, Nick, for your solution and for the interesting reference!
To make the first occurrence conditional on specific category values (e.g. 2), I modified the code as follows:
egen first_1 = min(year / (event == 1 & category_id == 2)), by (actor_id)
This modification appears to work well, too.
Kind regards,
Erik.
> Date: Fri, 7 Sep 2012 10:47:31 +0100
> Subject: Re: st: Identifying and recording the first occurrence of an event by actor given category
> From: [email protected]
> To: [email protected]
>
> Stata is great at this kind of problem. The essence of Erik's
> difficulty is the need to look in other observations for the same
> panel to produce the new variable.
>
> First off, the first year anything occurred is just the minimum year
> anything occurred, so we can get at that minimum in several ways:
> sorting, using -summarize-, -egen- etc.
>
> Given the panel structure, -egen- is a good tool, because functions
> that support a -by()- option or a -by:- prefix will handle panels
> separately.
>
> Here is one solution:
>
> egen first_1 = min(year / (event == 1)), by(actor_id)
>
> Here is another:
>
> egen first_1 = min(cond(event == 1, year, .)), by(actor_id)
>
> This approach is discussed in detail within
>
> Cox, N.J. 2011. Speaking Stata: Compared with ... Stata Journal 11(2): 305-314
>
> Abstract. Many problems in data management center on relating values
> to values in other observations, either within a dataset as a whole or
> within groups such as panels. This column reviews some basic Stata
> techniques helpful for such tasks, including the use of subscripts,
> summarize, by:, sum(), cond(), and egen. Several techniques exploit
> the fact that logical expressions yield 1 when true and 0 when false.
> Dividing by zero to yield missings is revealed as a surprisingly
> valuable device.
>
> Erik's question appears a bit more complicated than I have answered
> here; if there is some twist I have missed no doubt he will make that
> clear.
>
> Nick
>
> On Fri, Sep 7, 2012 at 10:07 AM, Erik Aadland <[email protected]> wrote:
>
> > I have an unbalanced panel dataset.
> > This is the structure:
> > actor_id year category_id event
> > 1 2000 1 .
> > 1 2000 2 1
> > 1 2001 2 1
> > 2 2003 3 .
> > 2 2003 2 1
> > 2 2004 2 .
> >
> > I want to generate a variable -first_occurrence- that identifies and records for each actor_id the first time the actor experienced event = 1 if the category = e.g. 2. I would like this -first occurrence- variable to capture the value of -year- at the time of first event occurrence. Some actors never experience event = 1.
> > For instance, if I track first occurrence by category_id = 2, this is what I look for:
> > actor_id year category_id event first_occurrence
> > 1 2000 1 . 2000
> > 1 2000 2 1 2000
> > 1 2001 2 1 2000
> > 2 2003 3 . 2003
> > 2 2003 2 1 2003
> > 2 2004 2 . 2003
> >
> > Any input or suggestions on this problem would be greatly appreciated.
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/