Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: How to define shortest possible period with 95% of observations
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: RE: How to define shortest possible period with 95% of observations
Date
Wed, 12 May 2010 15:41:38 -0400
Daniel et al.--
I think Robert's code also assumes that only observations for one year
are in memory; it is not -by-able. So you might be incorrectly
calculating within your loop across years--you may want to create some
fake data as Robert did and show us what you are doing. A
disadvantage of moving to a non-calendar year is that it is harder to
explain, but Oct 1 to Sep 30 might work for your purposes (if peak
times are near Apr 1), and corresponds to a common fiscal year, fwiw.
On Wed, May 12, 2010 at 3:33 PM, Steve Samuels <[email protected]> wrote:
> What you are asking is not only contrary to Nick's recommendation
> (and, now, mine), it is a mistake. You are assuming that the peak day
> for fires occurs exactly in the middle of a "fire year". Of course
> the peak days will differ from year to year, and you will wind up
> with overlapping periods.
>
> Steve
>
>
>
> On Wed, May 12, 2010 at 3:13 PM, Nick Cox <[email protected]> wrote:
>> What you outline is not what I was recommending. It's an awkward
>> half-way house.
>>
>> But in terms of your new question, see
>>
>> SJ-6-4 dm0025 . . . . . . . . . . Stata tip 36: Which observations?
>> Erratum
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
>> J. Cox
>> Q4/06 SJ 6(4):596 (no commands)
>> correction of example code for Stata tip 36
>>
>> SJ-6-3 dm0025 . . . . . . . . . . . . . . Stata tip 36: Which
>> observations?
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
>> J. Cox
>> Q3/06 SJ 6(3):430--432 (no
>> commands)
>> tip for identifying which observations satisfy some
>> specified condition
>>
>> Nick
>> [email protected]
>>
>> Daniel Mueller
>>
>> Thanks.
>>
>> In the code below I did my best to think in fire years by defining the
>> peak fire day (these are always unique) as the middle of the year. Then
>> I'd like to place the maximum into a local macro (where I fail miserably
>>
>> in line 2 as Steve rightly pointed out):
>>
>> qui su no_fire_day
>> loc yearstart = Day[r(max)] - 183
>> loc yearend = `yearstart' + 365
>>
>> My simple question is how I can place the Day where no_fires_day =
>> r(max) into a local macro?
>>
>> (I ignore leap years for the sake of simplicity..)
>>
>>
>> Nick Cox wrote on 5/12/2010 10:44 PM:
>>> Without looking at this in detail, it seems to me that you might
>> benefit
>>> from thinking in terms of fire years, rather than calendar years,
>>> starting on some day other than January 1. After all, all sorts of
>>> different sciences, not to mention religions, have years that don't
>>> coincide with conventional Western calendar years: fiscal years, water
>>> years, academic years, etc., etc.
>>>
>>> Several pertinent -egen- functions are included in -egenmore- on SSC.
>>>
>>> In other words, define the time scale in terms of those fire years;
>> then
>>> Robert's code will probably not need any complicated adjustments.
>>>
>>> Nick
>>> [email protected]
>>>
>>> Daniel Mueller
>>>
>>> Robert, this works like charm!!! Thanks a bunch for this neat code.
>> Also
>>>
>>> thanks to Nick for pointing me to -shorth- which I will certainly
>>> explore in more detail after having sipped through the extensive
>>> reference list.
>>>
>>> Using Roberts code I can seamlessly loop over the nine years of data
>> and
>>>
>>> generate the shortest fire season per year with 95% of obs. The
>> results
>>> suggested an additional complication.. For some subsets the shortest
>>> possible period likely starts a couple of days before Jan 1st, at the
>>> end of the preceding year.
>>>
>>> I tweaked Roberts code a little to loop over years and defined the
>>> middle of a year as the peak fire day. The code runs through, yet sets
>>> the start of the fire season for some subsets to Jan 1st, while my
>>> educated guess is that it should be somewhere around mid to end of
>>> December. Something went wrong, but I can't spot the glitch in the
>> code
>>> below. Can someone please help?
>>>
>>> Thanks a lot in advance and best regards,
>>> Daniel
>>>
>>>
>>> *** start
>>> forv y = `yearfirst'/`yearlast' {
>>>
>>> * keep previous year
>>> if `y' != `yearfirst' {
>>> keep if Year == `y' | Year == (`y'-1)
>>> }
>>> bys Day: g no_fire_day = _N
>>> qui su no_fire_day
>>>
>>> * define year to start 183 days before peak fire day
>>> loc yearstart = Day[r(max)] - 183
>>> loc yearend = `yearstart' + 365
>>> keep if Day> `yearstart'& Day< `yearend' // or with
>> egen->rotate?
>>> bys Day: keep if _n == _N
>>> g nobs = _n
>>>
>>> * the target is a continuous run that includes 95% of all fires
>>> sum no_fire_day, meanonly
>>> scalar target = .95 * r(sum)
>>>
>>> scalar shortlen = .
>>> gen arun = .
>>> gen bestrun = .
>>>
>>> * at each pass, create a run that starts at nobs == `i'
>>> * and identify the nobs where the number of fires>= 95%
>>> local more 1
>>> local i = 0
>>> while `more' {
>>> local i = `i' + 1
>>> qui replace arun = sum(no_fire_day * (nobs>= `i'))
>>> sum nobs if arun>= target, meanonly
>>> if r(N) == 0 local more 0
>>> else if (Day[r(min)] - Day[`i'])< shortlen {
>>> scalar shortlen = Day[r(min)] - Day[`i']
>>> qui replace bestrun = arun
>>> qui replace bestrun = . if nobs> r(min) | nobs< `i'
>>> }
>>> }
>>> qui drop if bestrun == .
>>> drop bestrun arun
>>> save fires_`y', replace
>>> }
>>> *** end
>>>
>>>
>>>
>>>
>>>
>>> Robert Picard wrote on 5/11/2010 3:28 AM:
>>>> Here is how I would approach this problem. I would do each year
>>>> separately; it could be done all at once but it would complicate the
>>>> code unnecessarily. If the fire data is one observation per fire, I
>>>> would -collapse- it to one observation per day. Each observation
>> would
>>>> contain the number of fires that day. The following code will
>> identify
>>>> the first instance of the shortest run of days that includes 95% of
>>>> fires for the year.
>>>>
>>>> Note that the following code will work, even if there are days
>> without
>>>> fires (and thus no observation for that day).
>>>>
>>>> *--------------------------- begin example -----------------------
>>>> version 11
>>>>
>>>> * daily fire counts; with some days without fires
>>>> clear all
>>>> set seed 123
>>>> set obs 365
>>>> gen day = _n
>>>> drop if uniform()< .1
>>>> gen nobs = _n
>>>> gen nfires = round(uniform() * 10)
>>>>
>>>> * the target is a continuous run that includes 95% of all fires
>>>> sum nfires, meanonly
>>>> scalar target = .95 * r(sum)
>>>> dis target
>>>>
>>>> scalar shortlen = .
>>>> gen arun = .
>>>> gen bestrun = .
>>>>
>>>> * at each pass, create a run that starts at nobs == `i'
>>>> * and identify the nobs where the number of fires>= 95%
>>>> local more 1
>>>> local i 0
>>>> while `more' {
>>>> local i = `i' + 1
>>>> qui replace arun = sum(nfires * (nobs>=`i'))
>>>> sum nobs if arun>= target, meanonly
>>>> if r(N) == 0 local more 0
>>>> else if (day[r(min)] - day[`i'])< shortlen {
>>>> scalar shortlen = day[r(min)] - day[`i']
>>>> qui replace bestrun = arun
>>>> qui replace bestrun = . if nobs> r(min) | nobs< `i'
>>>> }
>>>> }
>>>>
>>>> *--------------------- end example --------------------------
>>>>
>>>>
>>>> Hope this help,
>>>>
>>>> Robert
>>>>
>>>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<[email protected]>
>>> wrote:
>>>>> I don't think any trick is possible unless you know in advance the
>>>>> precise distribution, e.g. that it is Gaussian, or exponential, or
>>>>> whatever, which here is not the case.
>>>>>
>>>>> So, you need to look at all the possibilities from the interval
>>> starting
>>>>> at the minimum to the interval starting at the 5% point of the fire
>>>>> number distribution in each year.
>>>>>
>>>>> However, this may all be achievable using -shorth- (SSC). Look at
>> the
>>>>> -proportion()- option, but you would need to -expand- first to get a
>>>>> separate observation for each fire. If that's not practicable, look
>>>>> inside the code of -shorth- to get ideas on how to proceed. Note
>> that
>>> no
>>>>> looping is necessary: the whole problem will reduce to use of -by:-
>>> and
>>>>> subscripts.
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>> Daniel Mueller
>>>>>
>>>>> I have a strongly unbalanced panel with 100,000 observations (=fire
>>>>> occurrences per day) that contain between none (no fire) and 3,000
>>> fires
>>>>>
>>>>> per day for 8 years. The fire events peak in March and April with
>>> about
>>>>> 85-90% of the yearly total.
>>>>>
>>>>> My question is how I can define the shortest possible continuous
>>> period
>>>>> of days for each year that contains 95% of all yearly fires. The
>>> length
>>>>> and width of the periods may slightly differ across the years due to
>>>>> climate and other parameters.
>>>>>
>>>>> I am sure there is a neat trick in Stata for this, yet I have not
>>>>> spotted it. Any suggestions would be appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/