Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Dataset of means from the three largest values of a group

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Dataset of means from the three largest values of a group
Date	Tue, 29 Nov 2011 09:14:39 +0000

The emphasis in the original posting was on creating a new dataset.
Let' s underline that there are several ways to do that. Here's
another

drop if missing(trade)
bysort country year (trade) : gen trade1 = trade[_N]
by country year : gen trade2 = trade[_N-1]
by country year : gen trade3 = trade[_N-2]
by country year : egen meanhighest = rowmean(trade?)
by country year : keep if _n == 1

On Mon, Nov 28, 2011 at 8:39 PM, Nick Cox <[email protected]> wrote:
> Someone might want ideas on how to handle the missings assumed
> previously not to exist.
>
> Here's one way:
>
> gen ismissing = missing(trade)
> bysort ismissing country year (trade)  :  gen tag  = (_N - _n) < 3
> by ismissing country year : egen meanhighest = mean(trade / tag)
> bysort country year (meanhighest) : replace meanhighest = meanhighest[1]
> drop ismissing
>
> Nick
>
> On Mon, Nov 28, 2011 at 8:25 PM, Nick Cox <[email protected]> wrote:
>> I assume no missing values for -trade-. "3" here evidently means here "up to 3"
>>
>> bysort country year (trade)  :  gen tag  = (_N - _n) < 3
>>
>> by country year : egen meanhighest = mean(trade / tag)
>>
>> On why division by zero can be useful, see
>>
>> SJ-11-2 dm0055  . . . . . . . . . . . . . .  Speaking Stata: Compared with ...
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>        Q2/11   SJ 11(2):305--314                                (no commands)
>>        reviews techniques for relating values to values in other
>>        observations
>>
>> The second highest (silver medallist) is more robust-resistant to outliers:
>>
>> bysort country year (trade) : gen silver = trade[_N-1]
>>
>> Nick
>>
>> On Mon, Nov 28, 2011 at 8:03 PM, Iulian Ihnatov <[email protected]> wrote:
>>
>>> I have the following dataset for the period of 1999 to 2010:
>>> country    year     partner     trade
>>> AFG          1999    USA          12345
>>> AFG          1999    DEU          9875
>>> AFG          1999    FRA           25487
>>> ........................
>>> AFG          2000    USA           5454
>>> AFG          2000    HUN          5454
>>> ........................
>>> HUN         1999    DEU           58744
>>> ........................
>>>
>>> I need to create a dataset of means of the "trade" variable, grouped by
>>> country and year, but only for the three largest observations of each group.
>>> I may use - collapse (mean) trade, by(country year) -, but I don't know how
>>> to isolate the largest three values from each group (in some years, there
>>> are only 1 or 2 observations available, in others more than 10). Any help
>>> would be highly appreciated.
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Dataset of means from the three largest values of a group
  - From: Iulian Ihnatov <[email protected]>
- Re: st: Dataset of means from the three largest values of a group
  - From: Nick Cox <[email protected]>
- Re: st: Dataset of means from the three largest values of a group
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Normalization of (standard deviation of) errors in mvprobit
Next by Date: st: Winsorize by time and group
Previous by thread: Re: st: Dataset of means from the three largest values of a group
Next by thread: Re: st: Dataset of means from the three largest values of a group
Index(es):
- Date
- Thread