Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Keep/Drop Observations for Top/Bottom X%
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Keep/Drop Observations for Top/Bottom X%
Date
Thu, 11 Oct 2012 11:07:27 +0100
That's undoubtedly correct. If you keep observations in memory that
you don't use, then indeed every analysis command needs an -if-
qualifier. It's best to generate an indicator say
. gen thisuse = inrange(mpg, 29, .)
and follow with commands -if thisuse-.
When people want to do this, in my experience they want to play with
focusing on different subsets, which would usually mean reading the
whole dataset back in again on the -drop- strategy. Also, with the
-drop- strategy you can't compare those -drop-ped with those not
-drop-ped.
I do -drop- observations that aren't convenient all the time, but for
problems like Lisa's I would lean marginally to what I suggest.
There is at least a small down-side to every way of doing this.
Nick
On Thu, Oct 11, 2012 at 10:54 AM, Justina Fischer <[email protected]> wrote:
> Hi Nick,
>
> in principle you might be right.
>
> However, for reasons of practicability it is sometimes recommendable for subset analysis to simply upload the full data and drop a part rather than working with an 'if' restriction throughout all regressions.
>
> HTH
>
> Jusitna
>
>
> -------- Original-Nachricht --------
>> Datum: Thu, 11 Oct 2012 10:46:02 +0100
>> Von: Nick Cox <[email protected]>
>> An: [email protected]
>> Betreff: Re: st: Keep/Drop Observations for Top/Bottom X%
>
>> You need not -keep- or -drop- to do this; in fact -keep- or -drop-
>> here is usually a bad idea.
>>
>> (Furthermore, regressions of this kind are often more problematic than
>> they seem, but I'll let others expand on that if they wish.)
>>
>> For full flexibility here, skip -summarize- and go straight to -_pctile-.
>>
>> For example,
>>
>> . sysuse auto
>> (1978 Automobile Data)
>>
>> . _pctile mpg, p(10 90)
>>
>> . ret li
>>
>> scalars:
>> r(r1) = 14
>> r(r2) = 29
>>
>> So you can follow up with
>>
>> ... if mpg >= 29
>>
>> Warnings:
>>
>> 1. Watch out for ties.
>>
>> 2. Watch out for missing values at the top end.
>>
>> ... if mpg >= 29
>>
>> would include missings on -mpg- (if there were any). -if inrange(mpg,
>> 29, .)- excludes the missings.
>>
>> Nick
>>
>> On Thu, Oct 11, 2012 at 10:34 AM, Lisa Wang <[email protected]> wrote:
>>
>> > I am unsure as to how I would go about keeping or dropping the
>> > top/bottom X% of observations of a variable. I would like to do this
>> > for further analysis on a subset of my data. For instance, I want to
>> > do some further regressions for the top 10% of my observations based
>> > on 'distance from home' and not the whole data set.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/