A short question on PCA for categorical variables: wouldn't
correspondence analysis be useful here? Or is my interpretation of CA
as the categorical analog of PCA way off base?
Tony
Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, August 12, 2009 1:48 PM
To: [email protected]
Subject: Re: Re: st: Principal Components Analysis with count data
There are various unstated assumptions and criteria that need to be
spelled out for a fruitful discussion.
1. Continuous versus discrete. I don't know any reason why PCA might not
be as helpful, or as useless, on discrete data (e.g. counts) as compared
with continuous data. I wouldn't think it useful for categorical
variables, which I take to be a quite different issue.
2. Skewed versus symmetric. In principle, PCA might work very well even
if some of the variables were highly skewed. In practice, skewness quite
often goes together with nonlinearities, and a transformation might help
in either case.
3. Whether PCA will work well does depend on what you expect it to do
ideally, which is not clear in the question.
Nick
[email protected]
Evans Jadotte <[email protected]>
I think a straightforward way to deal with this issue is to apply a
Multiple Correspondence Analysis (MCA) to your data. See Asselin (2002)
for an application, and also reference therein.
Cameron McIntosh
> You should also check out chapters 8 and 9 of:
>
> Basilevsky, A. (1994). Statistical Factor Analysis and Related
Methods: Theory and Applications. New York: Wiley.
[email protected]
>> I don't know much about this but a while ago I was looking for
something similar and I came across this paper which helped me:
>>
>> http://cosco.hiit.fi/search/MPCA/buntineDPCA.pdf
>>
>> If that's not useful to you, it has a bunch of references in the
back. Maybe those can help.
Jason Ferris
>>> As PCA is appropriate for continuous data. I am wondering if it is
>>> appropriate for count data (i.e., highly skewed)? Can someone
provide
>>> advice, guidance or a resource in using PCA with count data?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/