Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?
Date
Tue, 25 Sep 2012 10:12:06 +0100
You did say that and I overlooked it.
The extra code follows in turn from an FAQ and the principles
discussed in my paper previously cited.
FAQ . . . . . . Listing observations in a group that differ on a variable
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
11/01 How do I list observations in a group that differ
on a variable?
http://www.stata.com/support/faqs/data/diff.html
bysort diagnosis group : keep if _N > 100
by diagnosis : drop if group[1] == group[_N]
If only one -group- is represented for each -diagnosis- then
necessarily the first and last are the same.
Nick
On Tue, Sep 25, 2012 at 9:28 AM, Caliph Omar Moumin
<[email protected]> wrote:
> Thank Nick for your quick reply
>
> I when i apply this command it is keeping if either of the two group is >= 100 observation. Which means there are cases which one of the groups have 0 observations
> I would like if and only if both groups have >=100 observations.
From: Nick Cox <[email protected]>
> Your title said ">="; your text varies between ">=" and "more than";
> clearly you need to choose between ">=" and ">".
>
> On Tue, Sep 25, 2012 at 8:31 AM, Nick Cox <[email protected]> wrote:
>> This is a simple application of -by:-, with which all long-term Stata
>> users should be familiar.
>>
>> bysort diagnosis group : keep if _N > 100
>>
>> Note that this procedure just counts observations, and is indifferent
>> to missing values. If you have missing values on key variables, -drop-
>> them first.
>>
>> Read the sections on -by:- in [U}. Then for a discursive tutorial on -by:-, see
>>
>> SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
>> Q1/02 SJ 2(1):86--102 (no commands)
>> explains the use of the by varlist : construct to tackle
>> a variety of problems with group structure, ranging from
>> simple calculations for each of several groups to more
>> advanced manipulations that use the built-in _n and _N
>>
>>
>> Nick
>>
>> On Tue, Sep 25, 2012 at 7:53 AM, Caliph Omar Moumin
>> <[email protected]> wrote:
>>>
>>> I have a large dataset which more than 500,000 observations; and more than 7000 diagnoses, which is grouped into two groups alcohol coded as "1" and nonlacloh as "0"
>>> the data structure is like this
>>>
>>> obs id diagnosis group............other variables
>>> 1 2338 A120 1
>>>
>>> 2 3838 m23 0
>>> .
>>> .
>>> .
>>> .
>>> 500,000 45566 y678 1
>>>
>>>
>>> So i want to keep if observations is >= 100 for both groups alcohol and nonalcohol based on daignoses. For example if daignoses A120 has more than 100 observations for both alcohol and nonalcohol keep if not drop it.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/