Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: analysis of mixture experiments
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: RE: analysis of mixture experiments
Date
Thu, 23 Sep 2010 12:27:15 +0100
You are correct. I am so used to seeing similar questions about response variables that I missed your very clear statement than the problem is on the other side.
There is a literature on _compositional data analysis_ that may help. Google that term for some references, including much material on the internet.
John Aitchison suggested various transformations for bundles of compositional variables. A while back I wrote Mata code for some, which I don't seem to have made public. Examples follow my signature and may serve at a minimum to show that they are straightforward to compute.
John A. Cornell has books on mixtures. Go to the Wiley website and search for "Cornell mixtures".
The main problem with most of the multivariate transformation methods I have seen is what to do with observed zeros for any of the components. Much of the compositional data analysis literature deals with geological examples in which it is plausible that an observed zero falls just below some detection limit and that it should be fudged upwards. Most of the examples I have looked at in my own fields of interest are not quite so simple and zeros often appeal to be real (exact, essential, structural, fixed).
Nick
[email protected]
// compositional data analysis
mata :
mata drop cda_*()
// NJC 1 Sept 2008
// rows scaled to sum to 1
real matrix function cda_closure(real matrix X) {
return(X :/ rowsum(X))
}
// NJC 1 Sept 2008
// ln(all but last column / last column)
real matrix function cda_alr(real matrix X) {
real scalar c, cm1
c = cols(X); cm1 = c - 1
return(ln(X[, (1 .. cm1)]) :- ln(X[, c]))
}
// NJC 1 Sept 2008
// ln(all / row geometric means)
real matrix function cda_clr(real matrix X) {
return(ln(X) :- mean(ln(X'))')
}
// NJC 1 Sept 2008
// centring
real matrix cda_centre(real matrix X) {
real rowvector centre, invcentre
centre = cda_closure(exp(mean(ln(X))))
invcentre = cda_closure((1 :/ centre))
return(cda_closure(X :* invcentre))
}
// NJC 3 Sept 2008
// column geometric means
real matrix cda_colgmean(real matrix X) {
return(exp(mean(ln(X))))
}
// NJC 3 Sept 2008
// row geometric means
real matrix cda_rowgmean(real matrix X) {
return(exp(mean(ln(X'))'))
}
// NJC 2 Sept 2008
// multiplicative replacement for rounded zeros
real matrix cda_mrzero(real matrix X, real rowvector delta, | real scalar total) {
real matrix iszero
if (total == .) total = 1
iszero = X :== 0
return((iszero :* delta) + ((!iszero) :* X :* (1 :- rowsum(iszero :* delta) :/ total)))
}
// NJC 10 Oct 2008
// isometric log-ratio transformation
real matrix function cda_ilr(real matrix X) {
real scalar c, j
real matrix Y, lnX
c = cols(X)
Y = X[, (1 .. c - 1)]; lnX = ln(X)
for (j = 1; j < c; j++) {
Y[, j] = rowsum(lnX[, (1 .. j)]) - j * lnX[, j + 1]
Y[, j] = (1 / sqrt(j * (j + 1))) * Y[, j]
}
return(Y)
}
end
Dan Kahan
thanks. I know dirifit; I am very fond of it. But here the proportions are my
IVs, not the DV, which is a continuous variable (one to which I would
ordinarily fit an OLS linear regression, except that that seems
intuitively wrong to me where my IVs are proportions).
On Wed, Sep 22, 2010 at 3:20 PM, Nick Cox <[email protected]> wrote:
>
> Look at -dirifit- from SSC.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/