Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Keep/Drop Observations for Top/Bottom X% |
Date | Thu, 11 Oct 2012 10:46:02 +0100 |
You need not -keep- or -drop- to do this; in fact -keep- or -drop- here is usually a bad idea. (Furthermore, regressions of this kind are often more problematic than they seem, but I'll let others expand on that if they wish.) For full flexibility here, skip -summarize- and go straight to -_pctile-. For example, . sysuse auto (1978 Automobile Data) . _pctile mpg, p(10 90) . ret li scalars: r(r1) = 14 r(r2) = 29 So you can follow up with ... if mpg >= 29 Warnings: 1. Watch out for ties. 2. Watch out for missing values at the top end. ... if mpg >= 29 would include missings on -mpg- (if there were any). -if inrange(mpg, 29, .)- excludes the missings. Nick On Thu, Oct 11, 2012 at 10:34 AM, Lisa Wang <lhwang0925@gmail.com> wrote: > I am unsure as to how I would go about keeping or dropping the > top/bottom X% of observations of a variable. I would like to do this > for further analysis on a subset of my data. For instance, I want to > do some further regressions for the top 10% of my observations based > on 'distance from home' and not the whole data set. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/