Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")? |
Date | Tue, 25 Sep 2012 10:12:06 +0100 |
You did say that and I overlooked it. The extra code follows in turn from an FAQ and the principles discussed in my paper previously cited. FAQ . . . . . . Listing observations in a group that differ on a variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox 11/01 How do I list observations in a group that differ on a variable? http://www.stata.com/support/faqs/data/diff.html bysort diagnosis group : keep if _N > 100 by diagnosis : drop if group[1] == group[_N] If only one -group- is represented for each -diagnosis- then necessarily the first and last are the same. Nick On Tue, Sep 25, 2012 at 9:28 AM, Caliph Omar Moumin <sheikmoumin@yahoo.com> wrote: > Thank Nick for your quick reply > > I when i apply this command it is keeping if either of the two group is >= 100 observation. Which means there are cases which one of the groups have 0 observations > I would like if and only if both groups have >=100 observations. From: Nick Cox <njcoxstata@gmail.com> > Your title said ">="; your text varies between ">=" and "more than"; > clearly you need to choose between ">=" and ">". > > On Tue, Sep 25, 2012 at 8:31 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> This is a simple application of -by:-, with which all long-term Stata >> users should be familiar. >> >> bysort diagnosis group : keep if _N > 100 >> >> Note that this procedure just counts observations, and is indifferent >> to missing values. If you have missing values on key variables, -drop- >> them first. >> >> Read the sections on -by:- in [U}. Then for a discursive tutorial on -by:-, see >> >> SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >> Q1/02 SJ 2(1):86--102 (no commands) >> explains the use of the by varlist : construct to tackle >> a variety of problems with group structure, ranging from >> simple calculations for each of several groups to more >> advanced manipulations that use the built-in _n and _N >> >> >> Nick >> >> On Tue, Sep 25, 2012 at 7:53 AM, Caliph Omar Moumin >> <sheikmoumin@yahoo.com> wrote: >>> >>> I have a large dataset which more than 500,000 observations; and more than 7000 diagnoses, which is grouped into two groups alcohol coded as "1" and nonlacloh as "0" >>> the data structure is like this >>> >>> obs id diagnosis group............other variables >>> 1 2338 A120 1 >>> >>> 2 3838 m23 0 >>> . >>> . >>> . >>> . >>> 500,000 45566 y678 1 >>> >>> >>> So i want to keep if observations is >= 100 for both groups alcohol and nonalcohol based on daignoses. For example if daignoses A120 has more than 100 observations for both alcohol and nonalcohol keep if not drop it. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/