Paul seems to be implying that whether a binary variable
is nominal is somehow deeper or more fundamental than it
being binary. I don't accept that at all.
To repeat an earlier example:
Suppose you have two identical
dummy variables (and some variation in each).
In terms of a scatter plot, you have two clusters,
one at the origin (0,0) and one at (1,1), like this
*
*
and a straight line is a perfect summary of such
data, and so the Pearson correlation is identically 1.
The graph above is label-free and deliberate so,
as the result holds irrespective of coding. I could
code the two levels as 7 and 42 or any other distinct
numbers and the correlation is unchanged. And
I don't see any objection to calling that a linear
relationship.
Nick
[email protected]
Paul Millar
> What fun this all is! Who'd have thought! Thanks for the
> fun with fundamentals!
>
> I think what Sam was getting at is that with binary
> variables, once you have the mean, you can throw away the
> data since the variance is directly derived from the mean.
> Nothing further is required, even to calculate confidence intervals.
>
> And I think Nick's response indicates why the level of
> measurement is relevant. If the LOM is nominal, there is no
> linear relationship, strictly speaking. Only when the scales
> are equi-interval does a linear relationship, and thus the
> correlation make theoretical sense; the correlation being a
> summary of the linear relationship, as Nick points out.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/