Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Create a flag (panel data)
From
R Zhang <[email protected]>
To
[email protected]
Subject
Re: st: Create a flag (panel data)
Date
Fri, 7 Mar 2014 20:57:30 -0500
Thanks a lot, Nick !!!
Your code works especially well for my huge data set (it is still
running with over 17 million observations).
On Fri, Mar 7, 2014 at 5:56 AM, Nick Cox <[email protected]> wrote:
> An alternative, crude but relatively easy to think about, is a loop
> over observations.
>
> This is documented in
>
> SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in intervals
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> Q3/07 SJ 7(3):440--443 (no commands)
> tip for counting or summarizing irregularly spaced
> events in intervals
>
> which is accessible at http://www.stata-journal.com/article.html?article=pr0033
>
> I am assuming that you will read the paper to understand the idea, but
> this illustrates some technique.
>
> gen howmany = 0
>
> quietly forval i = 1/`=_N' {
> count if firmid == firmid[`i'] ///
> & (patentID == citedID[`i'] | citedID == citedID[`i']) ///
> & inrange(year, year[`i']-5, year[`i']-1)
> replace howmany = r(N) in `i'
> }
>
> The conditions are
>
> 1. the same firm id
>
> 2. this citedID was a previous patentID or a previous citedID
>
> 3. previous 5 years, e.g. for 1995, consider 1990-1994
>
> Clearly you should modify for different definition of "previous 5
> years" and for anything I misunderstood.
>
> Note that
>
> count if firmid == firmid[`i'] ///
> & (patentID == citedID[`i'] | citedID == citedID[`i']) ///
> & inrange(year, year[`i']-4, year[`i'])
>
> would always catch the current observation.
>
> A count can easily be turned into an indicator
>
> gen wascited = howmany > 0
>
> Nick
> [email protected]
>
>
> On 7 March 2014 05:55, Roberto Ferrer <[email protected]> wrote:
>> I think this will give you what you want, but it will work depending
>> on the size of your database because -joinby- creates, at least
>> temporarily, new observations.
>>
>> *----------------------- begin code --------------------
>>
>> *clear all
>> set more off
>>
>> *----------------------- example data --------------------
>>
>> input ///
>> year str2 firmid patentID citedID
>> 1995 "AA" 100001 100002
>> 1995 "AA" 100001 100003
>> 1995 "AA" 100001 100004
>> 1994 "AA" 110001 100002
>> 1994 "AA" 110001 100005
>> 1994 "AA" 110001 120001
>> 1993 "AA" 120001 100006
>> 1993 "AA" 120001 100007
>> 1992 "AA" 130001 100008
>> 1992 "AA" 130001 100009
>> 1991 "AA" 140001 1000010
>> 1991 "AA" 140001 100011
>> 1989 "AA" 140001 100011
>> 1988 "AA" 140001 100011
>> 1995 "BB" 100001 100002
>> 1995 "BB" 100001 100003
>> 1995 "BB" 100001 100004
>> 1994 "BB" 110001 100002
>> 1994 "BB" 110001 100005
>> 1994 "BB" 110001 120001
>> 1993 "BB" 120001 100006
>> 1993 "BB" 120001 100007
>> 1992 "BB" 130001 100008
>> 1992 "BB" 130001 100009
>> 1991 "BB" 140001 1000010
>> 1991 "BB" 140001 100011
>> end
>>
>> sort firmid year
>> list, sepby(firmid)
>>
>> tempfile main
>> save "`main'"
>>
>> *---------------------- what you want ? -------------------------
>>
>> rename (year patentID citedID) =0
>> joinby firmid using "`main'"
>>
>> sort firmid year0 year
>> by firmid: drop if year0 < year
>> by firmid: drop if (year0 - year) > 4
>>
>> by firmid: gen flag = (citedID0 == patentID | citedID0 == citedID) &
>> (year0 != year)
>>
>> drop year patentID citedID
>> rename *0 *
>>
>> collapse (sum) flag, by(year firmid patentID citedID)
>>
>> sort firmid year
>> replace flag = (flag > 0)
>> list, sepby(firmid)
>>
>> *---------------------- end code ----------------------------
>>
>> Insert -list- anywhere you like, to see what's going on.
>>
>> On Thu, Mar 6, 2014 at 10:07 PM, R Zhang <[email protected]> wrote:
>>> Dear Statalisters,
>>>
>>> I have the following panel data (only provided a small number of observations)
>>>
>>> PatentID: is the identification number for company AA's patent,
>>> citedID is the identification number of a patent that was cited by the
>>> focal patent. I want to generate a dummy that flags the citedID under
>>> the following condition:
>>>
>>> citedID=1 if this patent (e.g. 1995 100002 was firm AA's own patent
>>> filed over the past 5 years, Or 100002 was a patent that was cited by
>>> firm AA over the past 5 years). Could you suggest a good way of doing
>>> this? Thanks !!!
>>>
>>> Year firmid patentID citedID
>>>
>>> 1995 AA 100001 100002
>>>
>>> 1995 AA 100001 100003
>>>
>>> 1995 AA 100001 100004
>>>
>>> 1994 AA 110001 100002
>>>
>>> 1994 AA 110001 100005
>>>
>>> 1994 AA 110001 120001
>>>
>>> 1993 AA 120001 100006
>>>
>>> 1993 AA 120001 100007
>>>
>>> 1992 AA 130001 100008
>>>
>>> 1992 AA 130001 100009
>>>
>>> 1991 AA 140001 1000010
>>>
>>> 1991 AA 140001 100011
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/