Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: summarizing data for each panel over chosen time windows
From
R Zhang <[email protected]>
To
[email protected]
Subject
st: summarizing data for each panel over chosen time windows
Date
Mon, 17 Mar 2014 22:58:56 -0400
Dear all,
I have a 17 million observation panel data (firm year combination). I
am creating a count for past five years for each firm. My original
posting was
http://www.stata.com/statalist/archive/2014-03/msg00215.html
please also refer to Nick's response. His coding works just fine for
the hypothetical data I posted.
input ///
year str2 firmid patentID citedID
1995 "AA" 100001 100002
1995 "AA" 100001 100003
1995 "AA" 100001 100004
1994 "AA" 110001 100002
1994 "AA" 110001 100005
1994 "AA" 110001 120001
1993 "AA" 120001 100006
1993 "AA" 120001 100007
1992 "AA" 130001 100008
1992 "AA" 130001 100009
1991 "AA" 140001 100010
1991 "AA" 140001 100011
1989 "AA" 140001 100011
1988 "AA" 140001 100011
1995 "BB" 100001 100002
1995 "BB" 100001 100003
1995 "BB" 100001 100004
1994 "BB" 110001 100002
1994 "BB" 110001 100005
1994 "BB" 110001 120001
1993 "BB" 120001 100006
1993 "BB" 120001 100007
1992 "BB" 130001 100008
1992 "BB" 130001 100009
1991 "BB" 140001 100010
1991 "BB" 140001 100011
end
the issue I have now is the real data has 17 million observations. The
computer ran for several days, and a sudden shutdown, I have to rerun
the program, and it is still going.
My question is : should I output the data in batch to prevent the
discontinuation of the program due to unexpected computer shutdown?
What is a good practice when you run a huge dataset ?
Any suggestions would be greatly appreciated !!!
-Rochelle
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/