[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Re: st: Principal Components Analysis with count data

From	"Verkuilen, Jay" <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: Re: st: Principal Components Analysis with count data
Date	Fri, 14 Aug 2009 12:19:33 -0400

Nick Cox wrote:

>There are various unstated assumptions and criteria that need to be
>spelled out for a fruitful discussion. 

>1. Continuous versus discrete. I don't know any reason why PCA might not
be as helpful, or as useless, on discrete data (e.g. counts) as compared
with continuous data. 

Agreed. The main thing is that discrete variables tend to be quite skewed and thus have strongly attenuated correlations. Much of the dimensionality you find is created by this issue. The temptation is to assume that 

     dimension = substantively interesting variation, 

but sadly this is often wrong. Instead, 

     dimension = systematic variation, 

but that's far from the same thing. 

>I wouldn't think it useful for categorical
variables, which I take to be a quite different issue. <

Well correspondence analysis is, essentially, principal components for categorical variables in the sense that CA depends on the singular value decomposition of the indicator matrix for categorical data in essentially the same way that PCA (or biplotting) uses the SVD of the data matrix for continuous variables. There's a large literature on it and, indeed, Stata has some nice procedures for it already built in. See -mca- and then expect to do some reading. 

>2. Skewed versus symmetric. In principle, PCA might work very well even
if some of the variables were highly skewed. In practice, skewness quite
often goes together with nonlinearities, and a transformation might help
in either case. <

Yup. 

JV

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: Re: st: Principal Components Analysis with count data
  - From: "Nick Cox" <[email protected]>

Prev by Date: Re: st: merging datasets and getting different N in resulting dataset if I run several times
Next by Date: RE: Re: st: Principal Components Analysis with count data
Previous by thread: RE: Re: st: Principal Components Analysis with count data
Next by thread: st: Principal Components Analysis with count data
Index(es):
- Date
- Thread