Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Proportional Independent Variables |
Date | Thu, 28 Feb 2013 13:32:06 +0000 |
You will have to fudge the zeros (#2) before you apply logratios (#1). As before, a key question is: are they structural (inevitably 0) or sampling (happen to be or to be reported as 0)? I got some of the guts of this field coded up as Mata functions a while back, but there is no documentation and that may not help much. // compositional data analysis mata : mata drop cda_*() // NJC 1 Sept 2008 // rows scaled to sum to 1 real matrix function cda_closure(real matrix X) { return(X :/ rowsum(X)) } // NJC 1 Sept 2008 // ln(all but last column / last column) real matrix function cda_alr(real matrix X) { real scalar c, cm1 c = cols(X); cm1 = c - 1 return(ln(X[, (1 .. cm1)]) :- ln(X[, c])) } // NJC 1 Sept 2008 // ln(all / row geometric means) real matrix function cda_clr(real matrix X) { return(ln(X) :- mean(ln(X'))') } // NJC 1 Sept 2008 // centring real matrix cda_centre(real matrix X) { real rowvector centre, invcentre centre = cda_closure(exp(mean(ln(X)))) invcentre = cda_closure((1 :/ centre)) return(cda_closure(X :* invcentre)) } // NJC 3 Sept 2008 // column geometric means real matrix cda_colgmean(real matrix X) { return(exp(mean(ln(X)))) } // NJC 3 Sept 2008 // row geometric means real matrix cda_rowgmean(real matrix X) { return(exp(mean(ln(X'))')) } // NJC 2 Sept 2008 // multiplicative replacement for rounded zeros real matrix cda_mrzero(real matrix X, real rowvector delta, | real scalar total) { real matrix iszero if (total == .) total = 1 iszero = X :== 0 return((iszero :* delta) + ((!iszero) :* X :* (1 :- rowsum(iszero :* delta) :/ total))) } // NJC 10 Oct 2008 // isometric log-ratio transformation real matrix function cda_ilr(real matrix X) { real scalar c, j real matrix Y, lnX c = cols(X) Y = X[, (1 .. c - 1)]; lnX = ln(X) for (j = 1; j < c; j++) { Y[, j] = rowsum(lnX[, (1 .. j)]) - j * lnX[, j + 1] Y[, j] = (1 / sqrt(j * (j + 1))) * Y[, j] } return(Y) } end On Thu, Feb 28, 2013 at 1:19 PM, nick bungy <nickbungystata@hotmail.co.uk> wrote: > Thank you for your responses, > My thoughts following this discussion are the following: > 1. Apply a logratio transformation to the data in the short run > 2. Look into a simplex mixture approach as a longer term aspiration, given my data does have a very large amount of 0's. I noticed the topic was mentioned in the book you kindly linked Nick, so that will be my first avenue to explore. > Best, > Nick > > ---------------------------------------- >> Date: Thu, 28 Feb 2013 07:35:23 -0500 >> Subject: Re: st: Proportional Independent Variables >> From: jvverkuilen@gmail.com >> To: statalist@hsphsun2.harvard.edu >> >> On Thu, Feb 28, 2013 at 4:19 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> > >> > 2. For different reasons log and logit transformations might be >> > considered. There is a very inward-looking literature on compositional >> > data analysis centred on more exotic transformations tailored to the >> > problem. The reference I gave earlier is one entry into that. >> >> I was going to throw out the same reference. It's not a trivial >> problem, but a narrow one due to the way it's been written. But the >> walkaway message of most of it is that the log-ratio transformation is >> the most reasonable one. This all just works out to being logit if you >> only had two, or log-odds. The logic is very similar to the >> multinomial logit, with the same difficult dependence structure. >> >> >> >> > 3. The two previous points are often complicated by measured zeros. >> > There is then a long slow agony about whether they are structural or >> > sampling zeros and what to do about them. The more components are >> > measured, the worse this usually gets, whether it is a fractions of a >> > budget spent on different things, or proportions of a material by >> > elements or compounds or particle size classes, or whatever. >> >> Yes, this is a real issue, and unfortunately the transformations used >> can create huge outlier problems, just like log transforms do when >> there's a 0 value. >> * * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/