Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: algorithmic question : running sum and computations
From
Nick Cox <[email protected]>
To
Francesco <[email protected]>
Subject
Re: st: algorithmic question : running sum and computations
Date
Fri, 17 Aug 2012 14:38:27 +0100
Just -drop- such observations, at least temporarily.
Nick
On Fri, Aug 17, 2012 at 2:33 PM, Francesco <[email protected]> wrote:
> You are perfectly right, Nick.
>
> Unfortunately I cannot rank the observations by time order during a
> day ... I only have the daily stamp of the observation (May 3rd for
> example), not the hour and minute... therefore I cannot know if in
> reality there was one spell or two (or more) according to your
> example...
> So maybe the best solution would be probably to ask a slight
> different question : for a given individual, what is his mean spell
> lenght of time, conditional on days WITHOUT a full round trip. That is
> not considering the days you suggest in your example below... which
> can easily be identified thanks to -bysort id product date, egen
> total_day=total(quantity) and filtered out with the condition
> total_day==0... I guess... :-(
>
> Have a nice day and thanks again,
> Best,
>
> On 17 August 2012 14:55, Nick Cox <[email protected]> wrote:
>> I don't have easy advice on this. As I understand it sorting on
>>
>> id product (date)
>>
>> can't distinguish between
>>
>> id 1 product A date 42 quantity 12
>> id 1 product A date 42 quantity -12
>> id 1 product A date 42 quantity 21
>> id 1 product A date 42 quantity -21
>>
>> and
>>
>> id 1 product A date 42 quantity 12
>> id 1 product A date 42 quantity -21
>> id 1 product A date 42 quantity 21
>> id 1 product A date 42 quantity -12
>>
>> In the first case you have two spells to 0, and in the second one
>> spell to 0. Your example shows that spells need not be two
>> observations long, so I don't know what to suggest.
>>
>> Nick
>>
>> On Fri, Aug 17, 2012 at 1:45 PM, Francesco <[email protected]> wrote:
>>> Actually Nick there is only a slight problem : dates could be repeated
>>> for the same individual AND the same product : for example there
>>> could be several round trips during the same day for the same
>>> product... In that case I would consider that there are as many
>>> delta_Date equal to zero as different round trips during the day for a
>>> particular product... My apologies I did not think of this particular
>>> and important case...
>>>
>>> Could the trick egen panelid = group(id product) be adapted in that case ?
>>>
>>> Many thanks
>>> Best Regards
>>>
>>> On 17 August 2012 13:58, Francesco <[email protected]> wrote:
>>>> Many, Many thanks Nick and Scott for your kind and very precise
>>>> answers! Spells is indeed what I needed ;-)
>>>>
>>>>
>>>> On 17 August 2012 13:43, Nick Cox <[email protected]> wrote:
>>>>> Using your data as a sandpit
>>>>>
>>>>> . clear
>>>>>
>>>>> . input id date str1 product quantity
>>>>>
>>>>> id date product quantity
>>>>> 1. 1 1 A 10
>>>>> 2. 1 2 A -10
>>>>> 3. 1 1 B 100
>>>>> 4. 1 2 B -50
>>>>> 5. 1 4 C 15
>>>>> 6. 1 8 C 100
>>>>> 7. 1 9 C -115
>>>>> 8. 1 10 C 10
>>>>> 9. 1 11 C -10
>>>>> 10. end
>>>>>
>>>>> it seems that we are interested in the length of time it takes for
>>>>> cumulative quantity to return to 0. -sum()- is there for cumulative
>>>>> sums:
>>>>>
>>>>> . bysort id product (date) : gen cumq = sum(q)
>>>>>
>>>>> In one jargon, we are interested in "spells" defined by the fact that
>>>>> they end in 0s for cumulative quantity. In Stata it is easiest to work
>>>>> with initial conditions defining spells, so we negate the date
>>>>> variable to reverse time:
>>>>>
>>>>> . gen negdate = -date
>>>>>
>>>>> As dates can be repeated for the same individual, treating data as
>>>>> panel data requires another fiction, that panels are defined by
>>>>> individuals and products:
>>>>>
>>>>> . egen panelid = group(id product)
>>>>>
>>>>> Now we can -tsset- the data:
>>>>>
>>>>> . tsset panelid negdate
>>>>> panel variable: panelid (unbalanced)
>>>>> time variable: negdate, -11 to -1, but with a gap
>>>>> delta: 1 unit
>>>>>
>>>>> -tsspell- from SSC, which you must install, is a tool for handling
>>>>> spells. It requires -tsset- data; the great benefit of that is that it
>>>>> handles panels automatically. (In fact almost all the credit belongs
>>>>> to StataCorp.) Here the criterion is that a spell is defined by
>>>>> starting with -cumq == 0-
>>>>>
>>>>> . tsspell, fcond(cumq == 0)
>>>>>
>>>>> -tsspell- creates three variables with names by default _spell _seq
>>>>> _end. _end is especially useful: it is an indicator variable for end
>>>>> of spells (beginning of spells when time is reversed). You can read
>>>>> more in the help for -tsspell-.
>>>>>
>>>>> . sort id product date
>>>>>
>>>>> . l id product date cumq _*
>>>>>
>>>>> +---------------------------------------------------+
>>>>> | id product date cumq _spell _seq _end |
>>>>> |---------------------------------------------------|
>>>>> 1. | 1 A 1 10 1 2 1 |
>>>>> 2. | 1 A 2 0 1 1 0 |
>>>>> 3. | 1 B 1 100 0 0 0 |
>>>>> 4. | 1 B 2 50 0 0 0 |
>>>>> 5. | 1 C 4 15 2 3 1 |
>>>>> |---------------------------------------------------|
>>>>> 6. | 1 C 8 115 2 2 0 |
>>>>> 7. | 1 C 9 0 2 1 0 |
>>>>> 8. | 1 C 10 10 1 2 1 |
>>>>> 9. | 1 C 11 0 1 1 0 |
>>>>> +---------------------------------------------------+
>>>>>
>>>>> You want the mean length of completed spells. Completed spells are
>>>>> tagged by _end == 1 or cumq == 0
>>>>>
>>>>> . egen meanlength = mean(_seq/ _end), by(id)
>>>>>
>>>>> This is my favourite division trick: _seq / _end is _seq if _end is 1
>>>>> and missing if _end is 0; missings are ignored by -egen-'s -mean()-
>>>>> function, so you get the mean length for each individual. It is
>>>>> repeated for each observation for each individual so you could go
>>>>>
>>>>> . egen tag = tag(id)
>>>>> . l id meanlength if tag
>>>>>
>>>>> I wrote a tutorial on spells.
>>>>>
>>>>> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
>>>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
>>>>> Q2/07 SJ 7(2):249--265 (no commands)
>>>>> shows how to handle spells with complete control over
>>>>> spell specification
>>>>>
>>>>> which is accessible at
>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0029
>>>>>
>>>>> Its principles underlie -tsspell-, but -tsspell- is not even
>>>>> mentioned, for which there is a mundane explanation. Explaining some
>>>>> basics as clearly and carefully as I could produced a paper that was
>>>>> already long and detailed, and adding detail on -tsspell- would just
>>>>> have made that worse.
>>>>>
>>>>> For more on spells, see Rowling (1997, 1998, 1999, etc.).
>>>>>
>>>>> Nick
>>>>>
>>>>> On Fri, Aug 17, 2012 at 11:30 AM, Francesco <[email protected]> wrote:
>>>>>> Dear Statalist,
>>>>>>
>>>>>> I am stuck with a little algorithmic problem and I cannot find an
>>>>>> simple (or elegant) solution...
>>>>>>
>>>>>> I have a panel dataset as (date in days) :
>>>>>>
>>>>>> ID DATE PRODUCT QUANTITY
>>>>>> 1 1 A 10
>>>>>> 1 2 A -10
>>>>>>
>>>>>> 1 1 B 100
>>>>>> 1 2 B -50
>>>>>>
>>>>>> 1 4 C 15
>>>>>> 1 8 C 100
>>>>>> 1 9 C -115
>>>>>>
>>>>>> 1 10 C 10
>>>>>> 1 11 C -10
>>>>>>
>>>>>>
>>>>>>
>>>>>> and I would like to know the average time (in days) it takes for an
>>>>>> individual in order to complete a full round trip (the variation in
>>>>>> quantity is zero)
>>>>>> For example, for the first id we can see that there we have
>>>>>>
>>>>>> ID PRODUCT delta_DATE delta_QUANTITY
>>>>>> 1 A 1=2-1 0=10-10
>>>>>> 1 C 5=4-9 0=15+100-115
>>>>>> 1 C 1=11-10 0=10-10
>>>>>>
>>>>>> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full
>>>>>> round trip. Indeed I can discard product B because there is no round
>>>>>> trip, that is 100-50 is not equal to zero.
>>>>>>
>>>>>> My question is therefore ... do you have an idea obtain this simply in
>>>>>> Stata ? I have to average across thousands of individuals... :)
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/