Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Austin Nichols <austinnichols@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: How to define shortest possible period with 95% of observations |
Date | Wed, 12 May 2010 15:41:38 -0400 |
Daniel et al.-- I think Robert's code also assumes that only observations for one year are in memory; it is not -by-able. So you might be incorrectly calculating within your loop across years--you may want to create some fake data as Robert did and show us what you are doing. A disadvantage of moving to a non-calendar year is that it is harder to explain, but Oct 1 to Sep 30 might work for your purposes (if peak times are near Apr 1), and corresponds to a common fiscal year, fwiw. On Wed, May 12, 2010 at 3:33 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > What you are asking is not only contrary to Nick's recommendation > (and, now, mine), it is a mistake. You are assuming that the peak day > for fires occurs exactly in the middle of a "fire year". Of course > the peak days will differ from year to year, and you will wind up > with overlapping periods. > > Steve > > > > On Wed, May 12, 2010 at 3:13 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: >> What you outline is not what I was recommending. It's an awkward >> half-way house. >> >> But in terms of your new question, see >> >> SJ-6-4 dm0025 . . . . . . . . . . Stata tip 36: Which observations? >> Erratum >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. >> J. Cox >> Q4/06 SJ 6(4):596 (no commands) >> correction of example code for Stata tip 36 >> >> SJ-6-3 dm0025 . . . . . . . . . . . . . . Stata tip 36: Which >> observations? >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. >> J. Cox >> Q3/06 SJ 6(3):430--432 (no >> commands) >> tip for identifying which observations satisfy some >> specified condition >> >> Nick >> n.j.cox@durham.ac.uk >> >> Daniel Mueller >> >> Thanks. >> >> In the code below I did my best to think in fire years by defining the >> peak fire day (these are always unique) as the middle of the year. Then >> I'd like to place the maximum into a local macro (where I fail miserably >> >> in line 2 as Steve rightly pointed out): >> >> qui su no_fire_day >> loc yearstart = Day[r(max)] - 183 >> loc yearend = `yearstart' + 365 >> >> My simple question is how I can place the Day where no_fires_day = >> r(max) into a local macro? >> >> (I ignore leap years for the sake of simplicity..) >> >> >> Nick Cox wrote on 5/12/2010 10:44 PM: >>> Without looking at this in detail, it seems to me that you might >> benefit >>> from thinking in terms of fire years, rather than calendar years, >>> starting on some day other than January 1. After all, all sorts of >>> different sciences, not to mention religions, have years that don't >>> coincide with conventional Western calendar years: fiscal years, water >>> years, academic years, etc., etc. >>> >>> Several pertinent -egen- functions are included in -egenmore- on SSC. >>> >>> In other words, define the time scale in terms of those fire years; >> then >>> Robert's code will probably not need any complicated adjustments. >>> >>> Nick >>> n.j.cox@durham.ac.uk >>> >>> Daniel Mueller >>> >>> Robert, this works like charm!!! Thanks a bunch for this neat code. >> Also >>> >>> thanks to Nick for pointing me to -shorth- which I will certainly >>> explore in more detail after having sipped through the extensive >>> reference list. >>> >>> Using Roberts code I can seamlessly loop over the nine years of data >> and >>> >>> generate the shortest fire season per year with 95% of obs. The >> results >>> suggested an additional complication.. For some subsets the shortest >>> possible period likely starts a couple of days before Jan 1st, at the >>> end of the preceding year. >>> >>> I tweaked Roberts code a little to loop over years and defined the >>> middle of a year as the peak fire day. The code runs through, yet sets >>> the start of the fire season for some subsets to Jan 1st, while my >>> educated guess is that it should be somewhere around mid to end of >>> December. Something went wrong, but I can't spot the glitch in the >> code >>> below. Can someone please help? >>> >>> Thanks a lot in advance and best regards, >>> Daniel >>> >>> >>> *** start >>> forv y = `yearfirst'/`yearlast' { >>> >>> * keep previous year >>> if `y' != `yearfirst' { >>> keep if Year == `y' | Year == (`y'-1) >>> } >>> bys Day: g no_fire_day = _N >>> qui su no_fire_day >>> >>> * define year to start 183 days before peak fire day >>> loc yearstart = Day[r(max)] - 183 >>> loc yearend = `yearstart' + 365 >>> keep if Day> `yearstart'& Day< `yearend' // or with >> egen->rotate? >>> bys Day: keep if _n == _N >>> g nobs = _n >>> >>> * the target is a continuous run that includes 95% of all fires >>> sum no_fire_day, meanonly >>> scalar target = .95 * r(sum) >>> >>> scalar shortlen = . >>> gen arun = . >>> gen bestrun = . >>> >>> * at each pass, create a run that starts at nobs == `i' >>> * and identify the nobs where the number of fires>= 95% >>> local more 1 >>> local i = 0 >>> while `more' { >>> local i = `i' + 1 >>> qui replace arun = sum(no_fire_day * (nobs>= `i')) >>> sum nobs if arun>= target, meanonly >>> if r(N) == 0 local more 0 >>> else if (Day[r(min)] - Day[`i'])< shortlen { >>> scalar shortlen = Day[r(min)] - Day[`i'] >>> qui replace bestrun = arun >>> qui replace bestrun = . if nobs> r(min) | nobs< `i' >>> } >>> } >>> qui drop if bestrun == . >>> drop bestrun arun >>> save fires_`y', replace >>> } >>> *** end >>> >>> >>> >>> >>> >>> Robert Picard wrote on 5/11/2010 3:28 AM: >>>> Here is how I would approach this problem. I would do each year >>>> separately; it could be done all at once but it would complicate the >>>> code unnecessarily. If the fire data is one observation per fire, I >>>> would -collapse- it to one observation per day. Each observation >> would >>>> contain the number of fires that day. The following code will >> identify >>>> the first instance of the shortest run of days that includes 95% of >>>> fires for the year. >>>> >>>> Note that the following code will work, even if there are days >> without >>>> fires (and thus no observation for that day). >>>> >>>> *--------------------------- begin example ----------------------- >>>> version 11 >>>> >>>> * daily fire counts; with some days without fires >>>> clear all >>>> set seed 123 >>>> set obs 365 >>>> gen day = _n >>>> drop if uniform()< .1 >>>> gen nobs = _n >>>> gen nfires = round(uniform() * 10) >>>> >>>> * the target is a continuous run that includes 95% of all fires >>>> sum nfires, meanonly >>>> scalar target = .95 * r(sum) >>>> dis target >>>> >>>> scalar shortlen = . >>>> gen arun = . >>>> gen bestrun = . >>>> >>>> * at each pass, create a run that starts at nobs == `i' >>>> * and identify the nobs where the number of fires>= 95% >>>> local more 1 >>>> local i 0 >>>> while `more' { >>>> local i = `i' + 1 >>>> qui replace arun = sum(nfires * (nobs>=`i')) >>>> sum nobs if arun>= target, meanonly >>>> if r(N) == 0 local more 0 >>>> else if (day[r(min)] - day[`i'])< shortlen { >>>> scalar shortlen = day[r(min)] - day[`i'] >>>> qui replace bestrun = arun >>>> qui replace bestrun = . if nobs> r(min) | nobs< `i' >>>> } >>>> } >>>> >>>> *--------------------- end example -------------------------- >>>> >>>> >>>> Hope this help, >>>> >>>> Robert >>>> >>>> On Mon, May 10, 2010 at 6:19 AM, Nick Cox<n.j.cox@durham.ac.uk> >>> wrote: >>>>> I don't think any trick is possible unless you know in advance the >>>>> precise distribution, e.g. that it is Gaussian, or exponential, or >>>>> whatever, which here is not the case. >>>>> >>>>> So, you need to look at all the possibilities from the interval >>> starting >>>>> at the minimum to the interval starting at the 5% point of the fire >>>>> number distribution in each year. >>>>> >>>>> However, this may all be achievable using -shorth- (SSC). Look at >> the >>>>> -proportion()- option, but you would need to -expand- first to get a >>>>> separate observation for each fire. If that's not practicable, look >>>>> inside the code of -shorth- to get ideas on how to proceed. Note >> that >>> no >>>>> looping is necessary: the whole problem will reduce to use of -by:- >>> and >>>>> subscripts. >>>>> >>>>> Nick >>>>> n.j.cox@durham.ac.uk >>>>> >>>>> Daniel Mueller >>>>> >>>>> I have a strongly unbalanced panel with 100,000 observations (=fire >>>>> occurrences per day) that contain between none (no fire) and 3,000 >>> fires >>>>> >>>>> per day for 8 years. The fire events peak in March and April with >>> about >>>>> 85-90% of the yearly total. >>>>> >>>>> My question is how I can define the shortest possible continuous >>> period >>>>> of days for each year that contains 95% of all yearly fires. The >>> length >>>>> and width of the periods may slightly differ across the years due to >>>>> climate and other parameters. >>>>> >>>>> I am sure there is a neat trick in Stata for this, yet I have not >>>>> spotted it. Any suggestions would be appreciated. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/