Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: algorithmic question : running sum and computations
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: algorithmic question : running sum and computations
Date
Fri, 17 Aug 2012 12:43:56 +0100
Using your data as a sandpit
. clear
. input id date str1 product quantity
id date product quantity
1. 1 1 A 10
2. 1 2 A -10
3. 1 1 B 100
4. 1 2 B -50
5. 1 4 C 15
6. 1 8 C 100
7. 1 9 C -115
8. 1 10 C 10
9. 1 11 C -10
10. end
it seems that we are interested in the length of time it takes for
cumulative quantity to return to 0. -sum()- is there for cumulative
sums:
. bysort id product (date) : gen cumq = sum(q)
In one jargon, we are interested in "spells" defined by the fact that
they end in 0s for cumulative quantity. In Stata it is easiest to work
with initial conditions defining spells, so we negate the date
variable to reverse time:
. gen negdate = -date
As dates can be repeated for the same individual, treating data as
panel data requires another fiction, that panels are defined by
individuals and products:
. egen panelid = group(id product)
Now we can -tsset- the data:
. tsset panelid negdate
panel variable: panelid (unbalanced)
time variable: negdate, -11 to -1, but with a gap
delta: 1 unit
-tsspell- from SSC, which you must install, is a tool for handling
spells. It requires -tsset- data; the great benefit of that is that it
handles panels automatically. (In fact almost all the credit belongs
to StataCorp.) Here the criterion is that a spell is defined by
starting with -cumq == 0-
. tsspell, fcond(cumq == 0)
-tsspell- creates three variables with names by default _spell _seq
_end. _end is especially useful: it is an indicator variable for end
of spells (beginning of spells when time is reversed). You can read
more in the help for -tsspell-.
. sort id product date
. l id product date cumq _*
+---------------------------------------------------+
| id product date cumq _spell _seq _end |
|---------------------------------------------------|
1. | 1 A 1 10 1 2 1 |
2. | 1 A 2 0 1 1 0 |
3. | 1 B 1 100 0 0 0 |
4. | 1 B 2 50 0 0 0 |
5. | 1 C 4 15 2 3 1 |
|---------------------------------------------------|
6. | 1 C 8 115 2 2 0 |
7. | 1 C 9 0 2 1 0 |
8. | 1 C 10 10 1 2 1 |
9. | 1 C 11 0 1 1 0 |
+---------------------------------------------------+
You want the mean length of completed spells. Completed spells are
tagged by _end == 1 or cumq == 0
. egen meanlength = mean(_seq/ _end), by(id)
This is my favourite division trick: _seq / _end is _seq if _end is 1
and missing if _end is 0; missings are ignored by -egen-'s -mean()-
function, so you get the mean length for each individual. It is
repeated for each observation for each individual so you could go
. egen tag = tag(id)
. l id meanlength if tag
I wrote a tutorial on spells.
SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/07 SJ 7(2):249--265 (no commands)
shows how to handle spells with complete control over
spell specification
which is accessible at
http://www.stata-journal.com/sjpdf.html?articlenum=dm0029
Its principles underlie -tsspell-, but -tsspell- is not even
mentioned, for which there is a mundane explanation. Explaining some
basics as clearly and carefully as I could produced a paper that was
already long and detailed, and adding detail on -tsspell- would just
have made that worse.
For more on spells, see Rowling (1997, 1998, 1999, etc.).
Nick
On Fri, Aug 17, 2012 at 11:30 AM, Francesco <[email protected]> wrote:
> Dear Statalist,
>
> I am stuck with a little algorithmic problem and I cannot find an
> simple (or elegant) solution...
>
> I have a panel dataset as (date in days) :
>
> ID DATE PRODUCT QUANTITY
> 1 1 A 10
> 1 2 A -10
>
> 1 1 B 100
> 1 2 B -50
>
> 1 4 C 15
> 1 8 C 100
> 1 9 C -115
>
> 1 10 C 10
> 1 11 C -10
>
>
>
> and I would like to know the average time (in days) it takes for an
> individual in order to complete a full round trip (the variation in
> quantity is zero)
> For example, for the first id we can see that there we have
>
> ID PRODUCT delta_DATE delta_QUANTITY
> 1 A 1=2-1 0=10-10
> 1 C 5=4-9 0=15+100-115
> 1 C 1=11-10 0=10-10
>
> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full
> round trip. Indeed I can discard product B because there is no round
> trip, that is 100-50 is not equal to zero.
>
> My question is therefore ... do you have an idea obtain this simply in
> Stata ? I have to average across thousands of individuals... :)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/