Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Principal Components Analysis with count data


From   Cameron McIntosh <[email protected]>
To   STATA LIST <[email protected]>
Subject   RE: st: Principal Components Analysis with count data
Date   Thu, 13 Aug 2009 11:51:07 -0400

You can accomodate categorical variables within PCA itself:
 
Kolenikov, S., & Angeles, G. (2009). Socioeconomic Status Measurement with Discrete Proxy Variables: Is Principal Component Analysis a Reliable Answer? Review of Income and Wealth,  55(1), 128-165.
 
Cam

----------------------------------------
> Subject: RE: Re: st: Principal Components Analysis with count data
> Date: Thu, 13 Aug 2009 08:23:59 -0700
> From: [email protected]
> To: [email protected]
>
> A short question on PCA for categorical variables: wouldn't
> correspondence analysis be useful here? Or is my interpretation of CA
> as the categorical analog of PCA way off base?
>
> Tony
>
> Peter A. Lachenbruch
> Department of Public Health
> Oregon State University
> Corvallis, OR 97330
> Phone: 541-737-3832
> FAX: 541-737-4001
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Wednesday, August 12, 2009 1:48 PM
> To: [email protected]
> Subject: Re: Re: st: Principal Components Analysis with count data
>
> There are various unstated assumptions and criteria that need to be
> spelled out for a fruitful discussion.
>
> 1. Continuous versus discrete. I don't know any reason why PCA might not
> be as helpful, or as useless, on discrete data (e.g. counts) as compared
> with continuous data. I wouldn't think it useful for categorical
> variables, which I take to be a quite different issue.
>
> 2. Skewed versus symmetric. In principle, PCA might work very well even
> if some of the variables were highly skewed. In practice, skewness quite
> often goes together with nonlinearities, and a transformation might help
> in either case.
>
> 3. Whether PCA will work well does depend on what you expect it to do
> ideally, which is not clear in the question.
>
> Nick
> [email protected]
>
>
> Evans Jadotte 
>
> I think a straightforward way to deal with this issue is to apply a
> Multiple Correspondence Analysis (MCA) to your data. See Asselin (2002)
> for an application, and also reference therein.
>
> Cameron McIntosh
>
>> You should also check out chapters 8 and 9 of:
>>
>> Basilevsky, A. (1994). Statistical Factor Analysis and Related
> Methods: Theory and Applications. New York: Wiley.
>
> [email protected]
>
>>> I don't know much about this but a while ago I was looking for
> something similar and I came across this paper which helped me:
>>>
>>> http://cosco.hiit.fi/search/MPCA/buntineDPCA.pdf
>>>
>>> If that's not useful to you, it has a bunch of references in the
> back. Maybe those can help.
>
> Jason Ferris
>
>>>> As PCA is appropriate for continuous data. I am wondering if it is
>>>> appropriate for count data (i.e., highly skewed)? Can someone
> provide
>>>> advice, guidance or a resource in using PCA with count data?
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
_________________________________________________________________
Stay in the loop and chat with friends, right from your inbox!
http://go.microsoft.com/?linkid=9671354
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index