Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: How to define shortest possible period with 95% of observations
From
Steve Samuels <[email protected]>
To
[email protected]
Subject
Re: st: RE: How to define shortest possible period with 95% of observations
Date
Wed, 12 May 2010 15:33:41 -0400
What you are asking is not only contrary to Nick's recommendation
(and, now, mine), it is a mistake. You are assuming that the peak day
for fires occurs exactly in the middle of a "fire year". Of course
the peak days will differ from year to year, and you will wind up
with overlapping periods.
Steve
On Wed, May 12, 2010 at 3:13 PM, Nick Cox <[email protected]> wrote:
> What you outline is not what I was recommending. It's an awkward
> half-way house.
>
> But in terms of your new question, see
>
> SJ-6-4 dm0025 . . . . . . . . . . Stata tip 36: Which observations?
> Erratum
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
> J. Cox
> Q4/06 SJ 6(4):596 (no commands)
> correction of example code for Stata tip 36
>
> SJ-6-3 dm0025 . . . . . . . . . . . . . . Stata tip 36: Which
> observations?
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
> J. Cox
> Q3/06 SJ 6(3):430--432 (no
> commands)
> tip for identifying which observations satisfy some
> specified condition
>
> Nick
> [email protected]
>
> Daniel Mueller
>
> Thanks.
>
> In the code below I did my best to think in fire years by defining the
> peak fire day (these are always unique) as the middle of the year. Then
> I'd like to place the maximum into a local macro (where I fail miserably
>
> in line 2 as Steve rightly pointed out):
>
> qui su no_fire_day
> loc yearstart = Day[r(max)] - 183
> loc yearend = `yearstart' + 365
>
> My simple question is how I can place the Day where no_fires_day =
> r(max) into a local macro?
>
> (I ignore leap years for the sake of simplicity..)
>
>
> Nick Cox wrote on 5/12/2010 10:44 PM:
>> Without looking at this in detail, it seems to me that you might
> benefit
>> from thinking in terms of fire years, rather than calendar years,
>> starting on some day other than January 1. After all, all sorts of
>> different sciences, not to mention religions, have years that don't
>> coincide with conventional Western calendar years: fiscal years, water
>> years, academic years, etc., etc.
>>
>> Several pertinent -egen- functions are included in -egenmore- on SSC.
>>
>> In other words, define the time scale in terms of those fire years;
> then
>> Robert's code will probably not need any complicated adjustments.
>>
>> Nick
>> [email protected]
>>
>> Daniel Mueller
>>
>> Robert, this works like charm!!! Thanks a bunch for this neat code.
> Also
>>
>> thanks to Nick for pointing me to -shorth- which I will certainly
>> explore in more detail after having sipped through the extensive
>> reference list.
>>
>> Using Roberts code I can seamlessly loop over the nine years of data
> and
>>
>> generate the shortest fire season per year with 95% of obs. The
> results
>> suggested an additional complication.. For some subsets the shortest
>> possible period likely starts a couple of days before Jan 1st, at the
>> end of the preceding year.
>>
>> I tweaked Roberts code a little to loop over years and defined the
>> middle of a year as the peak fire day. The code runs through, yet sets
>> the start of the fire season for some subsets to Jan 1st, while my
>> educated guess is that it should be somewhere around mid to end of
>> December. Something went wrong, but I can't spot the glitch in the
> code
>> below. Can someone please help?
>>
>> Thanks a lot in advance and best regards,
>> Daniel
>>
>>
>> *** start
>> forv y = `yearfirst'/`yearlast' {
>>
>> * keep previous year
>> if `y' != `yearfirst' {
>> keep if Year == `y' | Year == (`y'-1)
>> }
>> bys Day: g no_fire_day = _N
>> qui su no_fire_day
>>
>> * define year to start 183 days before peak fire day
>> loc yearstart = Day[r(max)] - 183
>> loc yearend = `yearstart' + 365
>> keep if Day> `yearstart'& Day< `yearend' // or with
> egen->rotate?
>> bys Day: keep if _n == _N
>> g nobs = _n
>>
>> * the target is a continuous run that includes 95% of all fires
>> sum no_fire_day, meanonly
>> scalar target = .95 * r(sum)
>>
>> scalar shortlen = .
>> gen arun = .
>> gen bestrun = .
>>
>> * at each pass, create a run that starts at nobs == `i'
>> * and identify the nobs where the number of fires>= 95%
>> local more 1
>> local i = 0
>> while `more' {
>> local i = `i' + 1
>> qui replace arun = sum(no_fire_day * (nobs>= `i'))
>> sum nobs if arun>= target, meanonly
>> if r(N) == 0 local more 0
>> else if (Day[r(min)] - Day[`i'])< shortlen {
>> scalar shortlen = Day[r(min)] - Day[`i']
>> qui replace bestrun = arun
>> qui replace bestrun = . if nobs> r(min) | nobs< `i'
>> }
>> }
>> qui drop if bestrun == .
>> drop bestrun arun
>> save fires_`y', replace
>> }
>> *** end
>>
>>
>>
>>
>>
>> Robert Picard wrote on 5/11/2010 3:28 AM:
>>> Here is how I would approach this problem. I would do each year
>>> separately; it could be done all at once but it would complicate the
>>> code unnecessarily. If the fire data is one observation per fire, I
>>> would -collapse- it to one observation per day. Each observation
> would
>>> contain the number of fires that day. The following code will
> identify
>>> the first instance of the shortest run of days that includes 95% of
>>> fires for the year.
>>>
>>> Note that the following code will work, even if there are days
> without
>>> fires (and thus no observation for that day).
>>>
>>> *--------------------------- begin example -----------------------
>>> version 11
>>>
>>> * daily fire counts; with some days without fires
>>> clear all
>>> set seed 123
>>> set obs 365
>>> gen day = _n
>>> drop if uniform()< .1
>>> gen nobs = _n
>>> gen nfires = round(uniform() * 10)
>>>
>>> * the target is a continuous run that includes 95% of all fires
>>> sum nfires, meanonly
>>> scalar target = .95 * r(sum)
>>> dis target
>>>
>>> scalar shortlen = .
>>> gen arun = .
>>> gen bestrun = .
>>>
>>> * at each pass, create a run that starts at nobs == `i'
>>> * and identify the nobs where the number of fires>= 95%
>>> local more 1
>>> local i 0
>>> while `more' {
>>> local i = `i' + 1
>>> qui replace arun = sum(nfires * (nobs>=`i'))
>>> sum nobs if arun>= target, meanonly
>>> if r(N) == 0 local more 0
>>> else if (day[r(min)] - day[`i'])< shortlen {
>>> scalar shortlen = day[r(min)] - day[`i']
>>> qui replace bestrun = arun
>>> qui replace bestrun = . if nobs> r(min) | nobs< `i'
>>> }
>>> }
>>>
>>> *--------------------- end example --------------------------
>>>
>>>
>>> Hope this help,
>>>
>>> Robert
>>>
>>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<[email protected]>
>> wrote:
>>>> I don't think any trick is possible unless you know in advance the
>>>> precise distribution, e.g. that it is Gaussian, or exponential, or
>>>> whatever, which here is not the case.
>>>>
>>>> So, you need to look at all the possibilities from the interval
>> starting
>>>> at the minimum to the interval starting at the 5% point of the fire
>>>> number distribution in each year.
>>>>
>>>> However, this may all be achievable using -shorth- (SSC). Look at
> the
>>>> -proportion()- option, but you would need to -expand- first to get a
>>>> separate observation for each fire. If that's not practicable, look
>>>> inside the code of -shorth- to get ideas on how to proceed. Note
> that
>> no
>>>> looping is necessary: the whole problem will reduce to use of -by:-
>> and
>>>> subscripts.
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>> Daniel Mueller
>>>>
>>>> I have a strongly unbalanced panel with 100,000 observations (=fire
>>>> occurrences per day) that contain between none (no fire) and 3,000
>> fires
>>>>
>>>> per day for 8 years. The fire events peak in March and April with
>> about
>>>> 85-90% of the yearly total.
>>>>
>>>> My question is how I can define the shortest possible continuous
>> period
>>>> of days for each year that contains 95% of all yearly fires. The
>> length
>>>> and width of the periods may slightly differ across the years due to
>>>> climate and other parameters.
>>>>
>>>> I am sure there is a neat trick in Stata for this, yet I have not
>>>> spotted it. Any suggestions would be appreciated.
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax: 206-202-4783
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/