Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: FW: Max/Min and Growth rates among subsample(by year) of panel data |
Date | Fri, 16 Aug 2013 18:27:45 +0100 |
Here is a trick which I like. bysort fips_num: egen peakRGDP = max(RGDP) if year >2001 & year <2010 bysort fips_num (peakRGDP): replace peakRGDP = peakRGDP[1] can be put in one line bysort fips_num: egen peakRGDP = max(RGDP / (year >2001 & year <2010)) Here's how it works: (year >2001 & year <2010) is true or false, evaluated as 1 when true and 0 when false. RGDP / (year >2001 & year <2010) is thus RGDP/1 = RGDP or RGDP/0 = missing. Normally, dividing by 0 is something you just know not to do, but here it is totally deliberate: you know the result will be missing, but as usual Stata will just ignore the missings, which is exactly what you want. max(RGDP / (year >2001 & year <2010)) therefore acts as if it were a conditional. You get a maximum, but conditionally It's purely psychology, and accidental similarity of notation, but you will know | as indicating a conditional, and here it's just a little tired and leaning over slightly and is shown as /. There are other ways to do it. Here's one: bysort fips_num: egen peakRGDP = max(cond(year >2001 & year <2010, RGDP, .)) The -max()- function of -egen- takes an expression, which gives extra scope here. (You could also use -twithin()-, by the way.) As for when the peak or trough occurs, that's the same trick bysort fips_num: egen whenpeak = min(year / (RGDP == peakRGDP)) Note that this code copes with ties too. If two or more years tie for peak, it takes the first. If you want another definition, go for -max()- or whatever else makes sense. I am bailing out here, but I think some of this carries over to the rest of your problem. Footnotes: I see no value in a -reshape- here. The divide by zero trick is written up in http://www.stata-journal.com/article.html?article=dm0055 Nick njcoxstata@gmail.com On 16 August 2013 17:31, Nickolas Lyell <nlyell@naco.org> wrote: > I've been trying to analyze a large database of counties and their economic indicators from 1990-2013. > > I have it formatted long by fips number (a unique identifier for each US county) and year, with the indicators wide. > > I am trying to calculate several indicators within a subsample of years by county. For instance, I would like to know the peak pre-recession level of real GDP (RGDP). Since I don't want the 90's or post-recession data to show up, I would like to limit the maximum to 2002-2009. > > To get this value and its year as variables in the dataset, I did the following: > > sort fips_num year > by fips_num: egen peakRGDP = max(RGDP) if year >2001 & year <2010 bysort fips_num (peakRGDP): replace peakRGDP = peakRGDP[1] by fips_num: gen peakRGDPyear = year if peakRGDP==RGDP bysort fips_num (peakRGDPyear): replace peakRGDPyear = peakRGDPyear[1] > > and the same thing for the trough value: > > sort fips_num year > by fips_num: egen troughRGDP = min(RGDP) if year >peakRGDPyear bysort fips_num (troughRGDP): replace troughRGDP = troughRGDP[1] by fips_num: gen troughRGDPyear = year if troughRGDP==RGDP bysort fips_num (troughRGDPyear): replace troughRGDPyear = troughRGDPyear[1] > > which works, but feels clunky. And neither does it help me to calculate annualized growth rates among subsamples of years. I would like to find out, for instance the annualized growth rate of RGDP from 1990-peakRGDPyear, peakRGDPyear-troughRGDPyear, troughRGDPyear-2013. > > Is there a better way to do this? Maybe I need to convert my data back to wide with respect to year and use rowmax and rowmin functions, but that also seems clunky. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/