Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Data management _flag on the basis of the frequency
From
Eric Booth <[email protected]>
To
[email protected]
Subject
Re: st: Data management _flag on the basis of the frequency
Date
Tue, 13 Mar 2012 18:21:24 -0500
<>
I think NJC's example for looping over the observations is instructive. -reshape- can be slow with large datasets on machines with limited computing power and my coding is far less straightforward than his method.
I adapted his code example to the example I showed earlier and there is one minor error in that the "year(date[`i']), year(date[`i']) -2)" part of the inlist() command should be reversed (I should be sure to reiterate NJC's mention that he hadn't tested this code example).
Our examples produce the same result if you remove this line from my example:
bys director_id month year: replace mark= . if _n != _N
which recodes 'mark' to missing for these repeated values. NJC's code marks all the duplicate years - mine did not due to the line above. It's still not clear which the OP wanted, but this reconciles that difference in the outcome of our code.
Finally, NJC's code does take into account the 'firm_id' which should be added to the -bysort- prefix part of several lines in my example. My code works only because 'firm_id' is a constant in the example data.
So, if you append this version of NJC's code to my earlier example, you'll get the same results with the two different approaches:
*********************!
**clean up over from my example
**first, comment this out in my example above--
******bys director_id month year: replace mark= . if _n != _N
recode mark (.=0)
***NJC's example with minor corrections:
gen flag = 0
gen possible = 0
quietly forval i = 1/`=_N' {
replace possible = firm_id == firm_id[`i'] & ///
director == director[`i'] & ///
inrange(year(date), year(date[`i']) - 2, year(date[`i'])) ///
& month(date) == month(date[`i'])
tab year if possible
if r(r) == 3 replace flag = 1 in `i'
}
sort firm dir month year
assert mark==flag //assumes you commented out the last line in my example
l if inlist(1, flag, mark)
*********************!
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
+979.845.6754
On Mar 13, 2012, at 2:30 PM, Nick Cox wrote:
> Here's another way to approach it. I assume the existence of a daily
> date variable -date-.
>
> gen flag = 0
> gen possible = 0
>
> quietly forval i = 1/`=_N' {
> replace possible = firm_id == firm_id[`i'] & director ==
> director[`i'] & inrange(year(date), year(date[`i']), year(date[`i']) -
> 2) & month(date) == month(date[`i'])
>
> tab year if possible
> if r(r) == 3 replace flag = 1 in `i'
> }
>
> The idea is a loop over observations. The rules are (a) same firm (b)
> same director (c) this year or previous two (d) same month. That
> defines possibles. We still need to check that all three years are
> represented in the possibles, which is true if a table of years has
> precisely three rows.
>
> This is the often despised "loop over observations". Occasionally it
> does yield code that is rather direct if also rather messy. It is not
> very fast, but it is often relatively quick to code.
>
> Code not tested. For more discussion of the strategy, see
>
> SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in intervals
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> Q3/07 SJ 7(3):440--443 (no commands)
> tip for counting or summarizing irregularly spaced
> events in intervals
>
> which is accessible to all via the Stata Journal website.
>
> However, the rules don't seem complete. If there is a record in
> 9/1998, 9/1999, 9/2000 shouldn't the last be flagged 1 similarly?
>
> Nick
>
> On Tue, Mar 13, 2012 at 12:55 PM, Agnieszka Trzeciakiewicz
> <[email protected]> wrote:
>
>> I'm trying to mark an occurrence of an event in a particular firm, made by a
>> particular director.
>>
>> The idea is to mark an occurrence with number 1, if there is an available
>> record in the same month over the past three consecutive years.
>> For instance if for a director A there is a record in 09/2000, 09/2001, and
>> 09/2002 I would like to assign year 2000 with mark 0, year 2001 with mark 0
>> and 2002 with mark 1.
>>
>> I have used Excel to solve my problem, and applied a relevant long formula .
>> However, Excel broke down each time I run it. The original file has over
>> 50000 records. Please find the table below.
>>
>> Is it possible to create ''mark'' column using STATA software and its data
>> management codes?
>> Thanks for your help.
>> Best wishes,
>> Agnieszka
>>
>> Table:
>> full date firm_id director_id mark
>> 22/10/2002 1 7 0
>> 07/04/2003 1 7 0
>> 11/04/2003 1 7 0
>> 01/10/2003 1 7 0
>> 01/10/2003 1 7 0
>> 20/10/2003 1 7 0
>> 07/04/2004 1 7 0
>> 08/04/2004 1 7 0
>> 16/04/2004 1 7 0
>> 25/10/2004 1 7 1
>> 12/11/2004 1 7 0
>> 07/04/2005 1 7 1
>> 27/04/2005 1 7 1
>> 25/08/2005 1 7 0
>> 05/09/2005 1 7 0
>> 12/09/2005 1 7 0
>> 25/10/2005 1 7 1
>> 25/04/2006 1 7 1
>> 25/04/2006 1 7 1
>> 05/09/2006 1 7 0
>> 05/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 28/09/2006 1 7 0
>> 25/10/2006 1 7 1
>> 27/04/2007 1 7 1
>> 27/04/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 06/09/2007 1 7 1
>> 26/10/2007 1 7 1
>> 01/11/2007 1 7 0
>> 31/03/2008 1 7 0
>> 31/03/2008 1 7 0
>> 24/04/2008 1 7 1
>> 24/04/2008 1 7 1
>> 03/09/2008 1 7 1
>> 03/09/2008 1 7 1
>> 16/09/2008 1 7 1
>> 14/10/2008 1 7 1
>> 07/04/2009 1 7 1
>> 07/04/2009 1 7 1
>> 07/09/2009 55 7 0
>> 07/09/2009 55 7 0
>> 07/09/2009 55 7 0
>> 23/09/2009 55 7 0
>> 23/09/2009 55 7 0
>> 23/09/2009 1 7 1
>> 08/10/2009 1 7 1
>> 22/03/2010 1 7 0
>> 22/03/2010 1 7 0
>> 23/03/2010 1 7 0
>> 23/03/2010 1 7 0
>> 23/03/2010 1 7 0
>> 12/04/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 07/09/2010 1 7 1
>> 10/09/2010 1 7 1
>> 06/10/2010 1 7 1
>> 27/05/2009 2 8 0
>> 09/12/2009 2 8 0
>> 05/02/2010 2 8 0
>> 20/05/2010 2 8 0
>> 23/08/2010 2 8 0
>> 22/04/2008 3 10 0
>> 18/12/2008 3 10 0
>> 24/04/2009 3 10 0
>> 24/04/2009 3 10 0
>> 19/03/2010 3 10 0
>> 19/03/2010 3 10 0
>>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/