Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: algorithmic question : running sum and computations
From
Francesco <[email protected]>
To
[email protected]
Subject
Re: st: algorithmic question : running sum and computations
Date
Fri, 17 Aug 2012 13:58:45 +0200
Many, Many thanks Nick and Scott for your kind and very precise
answers! Spells is indeed what I needed ;-)
On 17 August 2012 13:43, Nick Cox <[email protected]> wrote:
> Using your data as a sandpit
>
> . clear
>
> . input id date str1 product quantity
>
> id date product quantity
> 1. 1 1 A 10
> 2. 1 2 A -10
> 3. 1 1 B 100
> 4. 1 2 B -50
> 5. 1 4 C 15
> 6. 1 8 C 100
> 7. 1 9 C -115
> 8. 1 10 C 10
> 9. 1 11 C -10
> 10. end
>
> it seems that we are interested in the length of time it takes for
> cumulative quantity to return to 0. -sum()- is there for cumulative
> sums:
>
> . bysort id product (date) : gen cumq = sum(q)
>
> In one jargon, we are interested in "spells" defined by the fact that
> they end in 0s for cumulative quantity. In Stata it is easiest to work
> with initial conditions defining spells, so we negate the date
> variable to reverse time:
>
> . gen negdate = -date
>
> As dates can be repeated for the same individual, treating data as
> panel data requires another fiction, that panels are defined by
> individuals and products:
>
> . egen panelid = group(id product)
>
> Now we can -tsset- the data:
>
> . tsset panelid negdate
> panel variable: panelid (unbalanced)
> time variable: negdate, -11 to -1, but with a gap
> delta: 1 unit
>
> -tsspell- from SSC, which you must install, is a tool for handling
> spells. It requires -tsset- data; the great benefit of that is that it
> handles panels automatically. (In fact almost all the credit belongs
> to StataCorp.) Here the criterion is that a spell is defined by
> starting with -cumq == 0-
>
> . tsspell, fcond(cumq == 0)
>
> -tsspell- creates three variables with names by default _spell _seq
> _end. _end is especially useful: it is an indicator variable for end
> of spells (beginning of spells when time is reversed). You can read
> more in the help for -tsspell-.
>
> . sort id product date
>
> . l id product date cumq _*
>
> +---------------------------------------------------+
> | id product date cumq _spell _seq _end |
> |---------------------------------------------------|
> 1. | 1 A 1 10 1 2 1 |
> 2. | 1 A 2 0 1 1 0 |
> 3. | 1 B 1 100 0 0 0 |
> 4. | 1 B 2 50 0 0 0 |
> 5. | 1 C 4 15 2 3 1 |
> |---------------------------------------------------|
> 6. | 1 C 8 115 2 2 0 |
> 7. | 1 C 9 0 2 1 0 |
> 8. | 1 C 10 10 1 2 1 |
> 9. | 1 C 11 0 1 1 0 |
> +---------------------------------------------------+
>
> You want the mean length of completed spells. Completed spells are
> tagged by _end == 1 or cumq == 0
>
> . egen meanlength = mean(_seq/ _end), by(id)
>
> This is my favourite division trick: _seq / _end is _seq if _end is 1
> and missing if _end is 0; missings are ignored by -egen-'s -mean()-
> function, so you get the mean length for each individual. It is
> repeated for each observation for each individual so you could go
>
> . egen tag = tag(id)
> . l id meanlength if tag
>
> I wrote a tutorial on spells.
>
> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
> Q2/07 SJ 7(2):249--265 (no commands)
> shows how to handle spells with complete control over
> spell specification
>
> which is accessible at
> http://www.stata-journal.com/sjpdf.html?articlenum=dm0029
>
> Its principles underlie -tsspell-, but -tsspell- is not even
> mentioned, for which there is a mundane explanation. Explaining some
> basics as clearly and carefully as I could produced a paper that was
> already long and detailed, and adding detail on -tsspell- would just
> have made that worse.
>
> For more on spells, see Rowling (1997, 1998, 1999, etc.).
>
> Nick
>
> On Fri, Aug 17, 2012 at 11:30 AM, Francesco <[email protected]> wrote:
>> Dear Statalist,
>>
>> I am stuck with a little algorithmic problem and I cannot find an
>> simple (or elegant) solution...
>>
>> I have a panel dataset as (date in days) :
>>
>> ID DATE PRODUCT QUANTITY
>> 1 1 A 10
>> 1 2 A -10
>>
>> 1 1 B 100
>> 1 2 B -50
>>
>> 1 4 C 15
>> 1 8 C 100
>> 1 9 C -115
>>
>> 1 10 C 10
>> 1 11 C -10
>>
>>
>>
>> and I would like to know the average time (in days) it takes for an
>> individual in order to complete a full round trip (the variation in
>> quantity is zero)
>> For example, for the first id we can see that there we have
>>
>> ID PRODUCT delta_DATE delta_QUANTITY
>> 1 A 1=2-1 0=10-10
>> 1 C 5=4-9 0=15+100-115
>> 1 C 1=11-10 0=10-10
>>
>> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full
>> round trip. Indeed I can discard product B because there is no round
>> trip, that is 100-50 is not equal to zero.
>>
>> My question is therefore ... do you have an idea obtain this simply in
>> Stata ? I have to average across thousands of individuals... :)
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/