Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Flagging most frequent occurrence |
Date | Thu, 24 Oct 2013 09:11:06 +0100 |
The most frequent value is often called the mode, and that's a keyword to use in a -search-. In fact, -egen- has a -mode()- function, although it is easier here to avoid it. Maarten has given one solution, but he has flagged year(s) that occur most frequently in the dataset as a whole. Here is another solution that flags year(s) that occur most frequently within each panel, which Steven seems to be asking for. Note that Steven's replace flag=1 most_freq==year is lacking an -if-. My suggestion: bysort id year : gen count = _N bysort id (count) : gen ismode = count == count[_N] Under the hood, -egen- is most often doing stuff like this, using -by:-, sorting and heavy use of _n and _N and getting indicator variables out of true-or-false evaluations (1 is true and 0 is false). . In the second statement just above, we -sort- the values with the highest count to the end of each panel; then the modes are just the values with the highest count, and this works even if there are ties for -year-. I don't see why Steven's code isn't equivalent, assuming correction of the typo above. Nick njcoxstata@gmail.com On 24 October 2013 08:40, Maarten Buis <maartenlbuis@gmail.com> wrote: > On Thu, Oct 24, 2013 at 8:03 AM, Steven Archambault wrote: >> I have panel data, where observations occur in different years. I want >> to flag the year that occurs the most often. > > *------------------ begin example ------------------ > // input some example data > clear all > input /// > id year > 1 2008 > 1 2008 > 1 2009 > 2 2009 > 2 2009 > 2 2010 > 2 2010 > 3 2009 > 3 2009 > 3 2010 > end > > // compute the flag > bys year : gen flag = _N > sum flag, meanonly > replace flag = (flag == r(max)) > > // admire the result > sort id year > list, sepby(id) > *------------------- end example ------------------- > * (For more on examples I sent to the Statalist see: > * http://www.maartenbuis.nl/example_faq ) > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/