Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to count occurrences of specific value
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: How to count occurrences of specific value
Date
Wed, 1 May 2013 13:31:24 +0100
Your code wouldn't work as
id = id[`i']
should be
id == id[`i']
but I presume you just copied it incorrectly.
I don't know a sure-fire way to speed this up. It might just be faster
if you -expand-ed the data to one observation for every day. Then the
code would be simpler, but you would end up with several millions of
observations. Or you could translate the code into Mata.
I don't think -egen- will help you here at all. The essence of the
problem is comparing each observation with others for the same
identifier, and -egen- stops at simple recipes of that kind.
Nick
[email protected]
On 1 May 2013 13:03, Jia Peng <[email protected]> wrote:
> Dear All,
>
> I have a data set with the following structure,
>
> id date flag
> 95001 14jun2000 1
> 95001 12apr2000 1
> 95001 16mar2000 0
> 95001 16nov1999 0
> 95001 10may1999 1
> 95001 30mar1995 0
> 95002 01nov1989 0
> 95002 01mar1985 1
> 95002 01jun1983 0
> 95002 01may1983 1
> 95002 01dec1982 0
> 95002 01oct1982 0
>
> And now, I would like to generate a new variable, say temp, which represents
> for each observation how many times flag == 1 has occurred within the same
> id from five years ago to the date specified, i.e., for the first
> observation, I want to count how many times flag == 1 has occurred with the
> id 95001 between 14jun1995 and 14jun2000.
>
> I have tried to loop over every observation using the following code,
>
> gen temp = .
> local N = _N
> forvalues i = 1(1)`N' {
> count if flag == 1 & id = id[`i'] & (date[`i'] - date)/365.25 <= 5 &
> (date[`i'] - date)/365.25 >= 0
> replace temp = r(N) in `i'
> }
>
> However, there are half a million observations in the entire data and the
> above code cost hours of time. Is there any way to solve the above problem
> more efficiently?
>
> I have also tried to use -egen-, but all I can get is how many times flag ==
> 1 has occurred with the same id. Is there any way to take into consideration
> different date ranges in this context?
>
> Any thoughts?
>
>
> Peng Jia
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/