It is simpler than you seem to fear.
Think in terms of a scatter plot. With two dummy
variables, the possible data points in
general are the 4 corners of the unit
square. The correlation treating these
numerically will have modulus 1 if and
only if the points populated in practice
are just the two opposite corners.
That is, with
* 1,1
* 0,0
the correlation would be 1, and with
* 0,1
* 1, 0
the correlation would be -1. In either
case a straight line would be a perfect
fit to the data, irrespective of how
many data points fall on each corner,
so long as some do.
In practice, with your dataset data fall on
3 out of 4 corners, and we can't say anything
so simple: the result of the correlation
will depend on the votes cast, as it were.
With this election result
2458
11119 739
the best-fit line would clearly tilt downwards,
but fairly gently, so the correlation looks fine by me,
qua correlation.
Nick
[email protected]
Kenley Barrett
> I'm sorry, I should have included all possible counts. I have pasted
> them below. To be sure that I understand properly: this correlation
> coefficient is due the fact that although a value of 1 for wifelit
> guarantees a value of 0 for wifeprim, and a value of 1 for wifeprim
> guarantees a value of 0 for wifelit, a value of 0 for wifeprim does
> NOT guarantee a value of 1 for wifelit, and a value of 0 for wifelit
> does NOT guarantee a value of 1 for wifeprim. So the correlation
> coefficient should not be -1 (as I was thinking earlier). Could you
> please confirm for me that I'm understanding this right? I'm sorry to
> bother you again; I am new at this, as you can tell.
>
> . count if wifelit == 1 & wifeprim == 1
> 0
>
> . count if wifelit == 0 & wifeprim == 1
> 2458
>
> . count if wifelit == 0 & wifeprim == 0
> 11119
>
> . count if wifelit == 1
> 739
>
> . count if wifeprim == 1
> 2458
>
>
> . count if wifelit == 1 & wifeprim == 0
> 739
>
> . corr wifelit wifeprim
> (obs=14316)
>
> | wifelit wifeprim
> -------------+------------------
> wifelit | 1.0000
> wifeprim | -0.1062 1.0000
> On Thu, 17 Feb 2005 15:58:32 -0000, Nick Cox
> <[email protected]> wrote:
> > You evidently have two dummies here, both 0 or 1.
> >
> > You give two of the four possible
> > counts, from which we can infer that
> > in 14316 - 2458 cases the values are 1 0 or 0 0.
> >
> > That seems entirely consistent with the correlation
> > you get. The entire 2 by 2 table from -tab wifeprim
> > wifelit- is the context for the correlation.
> >
> > Nick
> > [email protected]
> >
> > Kenley Barrett
> >
> > > I am getting strange results when I run the "corr" command on my
> > > variables. From my understanding, "corr" gives the correlation
> > > coefficient, so if a value of 1 for Dummy Variable A guarantees a
> > > value of 0 for Dummy Variable B, then corr should give a
> result of -1.
> > > But instead I am getting values between 0 and -1. A sample of two
> > > variables shown below:
> > >
> > > . count if wifelit == 1 & wifeprim == 1
> > > 0
> > >
> > > . count if wifelit == 0 & wifeprim == 1
> > > 2458
> > >
> > > . corr wifelit wifeprim
> > > (obs=14316)
> > >
> > > | wifelit wifeprim
> > > -------------+------------------
> > > wifelit | 1.0000
> > > wifeprim | -0.1062 1.0000
> > >
> > > What could be the problem? Am I misunderstanding the corr command?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/