Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to count occurrences of specific value
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: How to count occurrences of specific value
Date
Thu, 2 May 2013 16:30:40 +0100
The line
replace temp = temp[`i'] + temp[`j']
should presumably be
replace temp = temp[`i'] + temp[`j'] in `i'
Otherwise, -temp- ends up holding the last constant calculated.
Nick
[email protected]
On 2 May 2013 16:21, Michael Barker <[email protected]> wrote:
> It looks like you are comparing each observation to every other
> observation in your data set. If your data are sorted, you only have
> to look back within each 5-year window for each id. It looks like your
> data are sorted descending by date, so the code would look like this:
>
> gen temp = flag
> local N = _N
> forvalues i = 1(1)`N' {
> local j=`i'+1
> while (id[`i']==id[`j'] & (date[`i'] - date[`j'])/365.25 <= 5) {
> replace temp = temp[`i'] + temp[`j']
> local j = `j'+1
> }
>
> If your data were sorted ascending by date, you would just iterate j
> downwards (j = i-1, j=j-1) and start the "forvalues" loop at 2
> (forvalues 2(1)`N' {)
>
> Mike
>
> On Wed, May 1, 2013 at 8:31 AM, Nick Cox <[email protected]> wrote:
>> Your code wouldn't work as
>>
>> id = id[`i']
>>
>> should be
>>
>> id == id[`i']
>>
>> but I presume you just copied it incorrectly.
>>
>> I don't know a sure-fire way to speed this up. It might just be faster
>> if you -expand-ed the data to one observation for every day. Then the
>> code would be simpler, but you would end up with several millions of
>> observations. Or you could translate the code into Mata.
>>
>> I don't think -egen- will help you here at all. The essence of the
>> problem is comparing each observation with others for the same
>> identifier, and -egen- stops at simple recipes of that kind.
>> Nick
>> [email protected]
>>
>>
>> On 1 May 2013 13:03, Jia Peng <[email protected]> wrote:
>>> Dear All,
>>>
>>> I have a data set with the following structure,
>>>
>>> id date flag
>>> 95001 14jun2000 1
>>> 95001 12apr2000 1
>>> 95001 16mar2000 0
>>> 95001 16nov1999 0
>>> 95001 10may1999 1
>>> 95001 30mar1995 0
>>> 95002 01nov1989 0
>>> 95002 01mar1985 1
>>> 95002 01jun1983 0
>>> 95002 01may1983 1
>>> 95002 01dec1982 0
>>> 95002 01oct1982 0
>>>
>>> And now, I would like to generate a new variable, say temp, which represents
>>> for each observation how many times flag == 1 has occurred within the same
>>> id from five years ago to the date specified, i.e., for the first
>>> observation, I want to count how many times flag == 1 has occurred with the
>>> id 95001 between 14jun1995 and 14jun2000.
>>>
>>> I have tried to loop over every observation using the following code,
>>>
>>> gen temp = .
>>> local N = _N
>>> forvalues i = 1(1)`N' {
>>> count if flag == 1 & id = id[`i'] & (date[`i'] - date)/365.25 <= 5 &
>>> (date[`i'] - date)/365.25 >= 0
>>> replace temp = r(N) in `i'
>>> }
>>>
>>> However, there are half a million observations in the entire data and the
>>> above code cost hours of time. Is there any way to solve the above problem
>>> more efficiently?
>>>
>>> I have also tried to use -egen-, but all I can get is how many times flag ==
>>> 1 has occurred with the same id. Is there any way to take into consideration
>>> different date ranges in this context?
>>>
>>> Any thoughts?
>>>
>>>
>>> Peng Jia
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/