Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | R Zhang <r05zhang@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Create a flag (panel data) |
Date | Fri, 7 Mar 2014 20:57:30 -0500 |
Thanks a lot, Nick !!! Your code works especially well for my huge data set (it is still running with over 17 million observations). On Fri, Mar 7, 2014 at 5:56 AM, Nick Cox <njcoxstata@gmail.com> wrote: > An alternative, crude but relatively easy to think about, is a loop > over observations. > > This is documented in > > SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in intervals > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > Q3/07 SJ 7(3):440--443 (no commands) > tip for counting or summarizing irregularly spaced > events in intervals > > which is accessible at http://www.stata-journal.com/article.html?article=pr0033 > > I am assuming that you will read the paper to understand the idea, but > this illustrates some technique. > > gen howmany = 0 > > quietly forval i = 1/`=_N' { > count if firmid == firmid[`i'] /// > & (patentID == citedID[`i'] | citedID == citedID[`i']) /// > & inrange(year, year[`i']-5, year[`i']-1) > replace howmany = r(N) in `i' > } > > The conditions are > > 1. the same firm id > > 2. this citedID was a previous patentID or a previous citedID > > 3. previous 5 years, e.g. for 1995, consider 1990-1994 > > Clearly you should modify for different definition of "previous 5 > years" and for anything I misunderstood. > > Note that > > count if firmid == firmid[`i'] /// > & (patentID == citedID[`i'] | citedID == citedID[`i']) /// > & inrange(year, year[`i']-4, year[`i']) > > would always catch the current observation. > > A count can easily be turned into an indicator > > gen wascited = howmany > 0 > > Nick > njcoxstata@gmail.com > > > On 7 March 2014 05:55, Roberto Ferrer <refp16@gmail.com> wrote: >> I think this will give you what you want, but it will work depending >> on the size of your database because -joinby- creates, at least >> temporarily, new observations. >> >> *----------------------- begin code -------------------- >> >> *clear all >> set more off >> >> *----------------------- example data -------------------- >> >> input /// >> year str2 firmid patentID citedID >> 1995 "AA" 100001 100002 >> 1995 "AA" 100001 100003 >> 1995 "AA" 100001 100004 >> 1994 "AA" 110001 100002 >> 1994 "AA" 110001 100005 >> 1994 "AA" 110001 120001 >> 1993 "AA" 120001 100006 >> 1993 "AA" 120001 100007 >> 1992 "AA" 130001 100008 >> 1992 "AA" 130001 100009 >> 1991 "AA" 140001 1000010 >> 1991 "AA" 140001 100011 >> 1989 "AA" 140001 100011 >> 1988 "AA" 140001 100011 >> 1995 "BB" 100001 100002 >> 1995 "BB" 100001 100003 >> 1995 "BB" 100001 100004 >> 1994 "BB" 110001 100002 >> 1994 "BB" 110001 100005 >> 1994 "BB" 110001 120001 >> 1993 "BB" 120001 100006 >> 1993 "BB" 120001 100007 >> 1992 "BB" 130001 100008 >> 1992 "BB" 130001 100009 >> 1991 "BB" 140001 1000010 >> 1991 "BB" 140001 100011 >> end >> >> sort firmid year >> list, sepby(firmid) >> >> tempfile main >> save "`main'" >> >> *---------------------- what you want ? ------------------------- >> >> rename (year patentID citedID) =0 >> joinby firmid using "`main'" >> >> sort firmid year0 year >> by firmid: drop if year0 < year >> by firmid: drop if (year0 - year) > 4 >> >> by firmid: gen flag = (citedID0 == patentID | citedID0 == citedID) & >> (year0 != year) >> >> drop year patentID citedID >> rename *0 * >> >> collapse (sum) flag, by(year firmid patentID citedID) >> >> sort firmid year >> replace flag = (flag > 0) >> list, sepby(firmid) >> >> *---------------------- end code ---------------------------- >> >> Insert -list- anywhere you like, to see what's going on. >> >> On Thu, Mar 6, 2014 at 10:07 PM, R Zhang <r05zhang@gmail.com> wrote: >>> Dear Statalisters, >>> >>> I have the following panel data (only provided a small number of observations) >>> >>> PatentID: is the identification number for company AA's patent, >>> citedID is the identification number of a patent that was cited by the >>> focal patent. I want to generate a dummy that flags the citedID under >>> the following condition: >>> >>> citedID=1 if this patent (e.g. 1995 100002 was firm AA's own patent >>> filed over the past 5 years, Or 100002 was a patent that was cited by >>> firm AA over the past 5 years). Could you suggest a good way of doing >>> this? Thanks !!! >>> >>> Year firmid patentID citedID >>> >>> 1995 AA 100001 100002 >>> >>> 1995 AA 100001 100003 >>> >>> 1995 AA 100001 100004 >>> >>> 1994 AA 110001 100002 >>> >>> 1994 AA 110001 100005 >>> >>> 1994 AA 110001 120001 >>> >>> 1993 AA 120001 100006 >>> >>> 1993 AA 120001 100007 >>> >>> 1992 AA 130001 100008 >>> >>> 1992 AA 130001 100009 >>> >>> 1991 AA 140001 1000010 >>> >>> 1991 AA 140001 100011 > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/