Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: calculating cumulative values of other observations
From
Eric Booth <[email protected]>
To
[email protected]
Subject
Re: st: calculating cumulative values of other observations
Date
Sat, 7 Apr 2012 16:36:28 -0500
<>
Your new [if] condition is preventing the adapted version of Nick's code from creating the running sum properly. To see how the [if] condition you added is "filtering" the observations that are eligible for the running sum, create an indicator/flag for your [if] conditions and inspect the results:
***
bysort ID activity (date_start): gen ind = 1 if ///
(date_start>=date_end[_n-1] & !mi(date_end[_n-1]))
***
You should be able to see that this stops Stata from including the observations you wanted included in the sum using 'ind'.
Without seeing your new variables - you could probably break your new code into two steps like:
***
bysort ID activity (date_start): gen priorhrs = ///
sum(hours) - hours
replace priorhrs = 0 if priorhrs>0 & ///
(date_start<date_end[_n-1] & !mi(date_end[_n-1]))
***
to get what you need.
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
+979.845.6754
On Apr 7, 2012, at 4:12 PM, KOTa wrote:
> thanks Eric,
> i did adapt your code for real dataset without problems (see my
> previous response)
>
> i am just interested how to do it using Nick's code, which works much
> faster (dataset is pretty large).
> i am probably missing some thing about how "sum" works or about
> boundary conditions
>
> my rewrite of Nick was:
>
> bysort ID activity (date_start): gen priorhrs = sum(hours) - hours if
> (date_start>=date_end[_n-1] & !mi(date_end[_n-1]))
>
> which does not work :/
>
> regards
> K.
>
> El día 7 de abril de 2012 14:09, Eric Booth <[email protected]> escribió:
>> <>
>>
>> On Apr 7, 2012, at 9:42 AM, KOTa wrote:
>>
>>> thanks Erik, Nick
>>>
>>> both ways work on this sample, and i managed to adjust for a real database also.
>>>
>>> few followup question:
>>>
>>> 1. Eric, any specific reason why you convert time into date format? I
>>> used it without conversion and it looks fine.
>>> Also on real data your solution somehow goes into infinite loop after
>>> finishing all calculations.
>>>
>> I don't know how you edited the code, so I don't know how you produced an infinite loop? It certainly doesn't do that with the example I gave you. Also, I created a time/date format version of "date" just out of habit, I find that it makes data management easier. If you don't need/want it, then skip it.
>>
>>
>>> 2. Nick, the solution you proposed with "sum" was the one i tried
>>> initially. The problem is that in real data I have 2 columns for date
>>> (start_date end_date) and i want to compare start_date [_n] >=
>>> end_date[all previous]
>>> Eric's solution works longer, but i could adjust it to work with
>>> different date columns.
>>> your solution is works much faster in case there is one date column,
>>> but i could not figure out how to do the same with 2 different
>>> columns.
>>
>> El día 7 de abril de 2012 04:08, Nick Cox <[email protected]> escribió:
>>> Why not
>>> bysort id activity (time): gen priorhrs = sum(time) - time
>>> Nick
>>
>>
>> Nick's -sum- solution does work (and is more straightforward than my -replace line) with a slight tweak - changing the first "time" to "date2" (if you create the time/date var from my example). Also, if you are using the same variable names from your original example you'd change "time" to "hours". Adding this to my example works:
>>
>> bysort ID activity (date2): gen njc = sum(hours) - hours
>>
>>
>> Adapting mine and Nick's code to your "real" dataset is another issue completely - if you cannot figure how to adapt our tips to your data/code, then you should post a data & code example/question that is closer to reality.
>>
>>
>> - Eric
>> __
>> Eric A. Booth
>> Public Policy Research Institute
>> Texas A&M University
>> [email protected]
>> Office: +979.845.6754
>>
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/