Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Code to generate dummy variable from several categorical variables?
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Code to generate dummy variable from several categorical variables?
Date
Tue, 17 Jan 2012 21:37:59 +0000
Note that David's suggestion of a composite categorical variable as one way to tackle this echoes
http://www.stata.com/statalist/archive/2012-01/msg00549.html
in which
egen group = group(A B C), label missing
was flagged as possible code. Deciding between that and
egen group2 = group(A B C), label
would regard a decision on what to do with missings.
Nick
[email protected]
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of David Hoaglin
Sent: 17 January 2012 21:23
To: [email protected]
Subject: Re: st: Code to generate dummy variable from several categorical variables?
Deborah,
The additional description is helpful. Thank you.
I would describe your planned ANOVAs as a preliminary analysis,
comparing the continuous demographic variables among groups defined by
the three outcome variables A, B, and C (jointly).
In a one-way ANOVA, the groups must be mutually exclusive. From your
initial message, some subjects have both A=1 and B=1 (and other
combinations in which more than one of the outcome variables are not
0). As a result, the groups defined by your three indicator variables
are not mutually exclusive.
Since you want to consider the three outcome variables together, I
think you have two main choices. Either you can enumerate the
combinations of A, B, and C that occur in your data (all 8 or only
some of the 8?), define a categorical variable that has a distinct
value for each of those mutually exclusive groups, and use that
variable to define the groups in a one-way ANOVA; or you can consider
a three-way ANOVA with A, B, and C as the factors and decide which
terms to include in the model (only main effects, main effects and
two-factor interactions, or main effects and two-factor and
three-factor interactions).
Once you have settled on the mutually exclusive groups (and before any
ANOVA), it would be a good idea to check whether each of the
demographic variables is suitable for an ANOVA or should be
transformed. Making boxplots of the demographic variable by group
would be one way to start.
I hope this discussion helps.
David Hoaglin
On Tue, Jan 17, 2012 at 2:39 PM, DEBORAH L. HUANG
<[email protected]> wrote:
> Basically what I'm hoping to do is "collapse" the outcome variables A, B and
> C (all binary) into the new outcome indicator variable abnlX for ANOVA
> (e.g., comparison mean age across indicators, among other continuous
> demographic variables).
>
> The new outcome variable abnlX would have 3 indicators (my mistake in the
> earlier message). As an indicator variable abnlX would be defined as
> follows:
>
> abnlX indicator #1 =0 if A is 0 or missing, B is 0/1/missing, C is
> 0/1/missing; =1 if A is 1, B is 0/1/missing, C is 0/1/missing
> abnlX indicator #2 =0 if B is 0 or missing, A is 0/1/missing, C is
> 0/1/missing; =1 if B is 1, A is 0/1/missing, C is 0/1/missing
> abnlX indicator #3 =0 if C is 0 or missing, A is 0/1/missing, B is
> 0/1/missing; =1 if C is 1, A is 0/1/missing, C is 0/1/missing
>
> Alternately for a categorical outcome variable abnlX it would be defined as
> follows:
> abnlX=0 if A=0 or missing & B=0 or missing & C=0 or missing
> abnlX=1 if A=1 & B=0/1/missing & C=0/1/missing
> abnlX=2 if B=1 & A=0/1/missing & C=0/1/missing
> abnlX=3 if C=1 & A=0/1/missing & B=0/1/missing
>
> Thank you again to everyone for your input, and hopefully this further
> clarifies my question.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/