Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Combinations of variables
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Combinations of variables
Date
Tue, 4 Jun 2013 17:04:01 +0100
It is perhaps pertinent to point out that basic Stata commands can get
you close:
bysort <varlist> : gen freq = _N
bysort <varlist> : gen tag = _n == 1
l <varlist> freq if tag
But then people often want to see percents, etc., to condition of -if-
and -in, etc., and so start to prefer a canned command.
Nick
[email protected]
On 4 June 2013 16:34, Seliger Florian <[email protected]> wrote:
> Thank you Nick, that helped a lot.
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Dienstag, 4. Juni 2013 16:21
> To: [email protected]
> Subject: Re: st: Combinations of variables
>
> There are several ways to get at this. One I like, for reasons easy to infer, is to use -groups- from SSC. The example here uses just two categorical variables, but having more variables is fine, just messier.
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . groups foreign rep78
>
> +------------------------------------+
> | foreign rep78 Freq. Percent |
> |------------------------------------|
> | Domestic 1 2 2.90 |
> | Domestic 2 8 11.59 |
> | Domestic 3 27 39.13 |
> | Domestic 4 9 13.04 |
> | Domestic 5 2 2.90 |
> |------------------------------------|
> | Foreign 3 3 4.35 |
> | Foreign 4 9 13.04 |
> | Foreign 5 9 13.04 |
> +------------------------------------+
>
> Note that -contract- would give you an easy answer, at the cost of destroying the dataset.
>
> Nick
> [email protected]
>
> On 4 June 2013 15:14, Seliger Florian <[email protected]> wrote:
>
>> I need to find the most frequent combinations of variables in my dataset.
>> There are 12 variables of interest each coded 0/1.
>>
>> Example:
>>
>> ID var1 var2 var3 ..
>> 1 0 1 0
>> 2 0 0 1
>> 3 0 1 0
>> 4 1 1 1
>> 5 0 1 0
>> .
>> .
>> .
>>
>> In this example, the most frequent combination is var1=0, var2=1, var3=0 (for ID 1, 3, 5).
>>
>> At the moment, I have no idea how to find out the combinations for so many different cases automatically.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/