Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Flagging most frequent occurrence
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Flagging most frequent occurrence
Date
Thu, 24 Oct 2013 09:11:06 +0100
The most frequent value is often called the mode, and that's a keyword
to use in a -search-. In fact, -egen- has a -mode()- function,
although it is easier here to avoid it.
Maarten has given one solution, but he has flagged year(s) that occur
most frequently in the dataset as a whole. Here is another solution
that flags year(s) that occur most frequently within each panel, which
Steven seems to be asking for.
Note that Steven's
replace flag=1 most_freq==year
is lacking an -if-.
My suggestion:
bysort id year : gen count = _N
bysort id (count) : gen ismode = count == count[_N]
Under the hood, -egen- is most often doing stuff like this, using
-by:-, sorting and heavy use of _n and _N and getting indicator
variables out of true-or-false evaluations (1 is true and 0 is false).
. In the second statement just above, we -sort- the values with the
highest count to the end of each panel; then the modes are just the
values with the highest count, and this works even if there are ties
for -year-.
I don't see why Steven's code isn't equivalent, assuming correction of
the typo above.
Nick
[email protected]
On 24 October 2013 08:40, Maarten Buis <[email protected]> wrote:
> On Thu, Oct 24, 2013 at 8:03 AM, Steven Archambault wrote:
>> I have panel data, where observations occur in different years. I want
>> to flag the year that occurs the most often.
>
> *------------------ begin example ------------------
> // input some example data
> clear all
> input ///
> id year
> 1 2008
> 1 2008
> 1 2009
> 2 2009
> 2 2009
> 2 2010
> 2 2010
> 3 2009
> 3 2009
> 3 2010
> end
>
> // compute the flag
> bys year : gen flag = _N
> sum flag, meanonly
> replace flag = (flag == r(max))
>
> // admire the result
> sort id year
> list, sepby(id)
> *------------------- end example -------------------
> * (For more on examples I sent to the Statalist see:
> * http://www.maartenbuis.nl/example_faq )
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/