I agree with Jay.
In addition, what to do with rounded zeros (i.e. zero was observed, but not expected) in compositional data is covered, among other things, in this collection of papers from the Geological Society [the original one, not the later U.S. society]. I had some fun coding the recommended fudge-nudge methods up as Mata functions.
Geosciences here indicates applications, not applicability.
Nick
[email protected]
% --- lightly edited from GSL website
Compositional Data Analysis in the Geosciences: From Theory to Practice
Series: GSL Special Publications
Ten Digit ISBN: 1-86239-205-6
Thirteen Digit ISBN: 978-1-86239-205-2
Author/Editor: Edited by A Buccianti, G Mateu-Figueras and V Pawlowsky-Glahn
Publisher: Geological Society of London
Publication Date: 16 October 2006
Binding: Hardback
Pages: 224
Weight: 0.83kg
£75.00 List price
Description
Since Karl Pearson wrote his paper on spurious correlation in 1897, a lot has been said about the statistical analysis of compositional data, mainly by geologists such as Felix Chayes. The solution appeared in the 1980s, when John Aitchison proposed to use logratios. Since then, the approach has seen a great expansion, mainly building on the idea of the 'natural geometry' of the sample space. Statistics is expected to give sense to our perception of the natural scale of the data, and this is made possible for compositional data using logratios. This publication will be a milestone in this process.
This book will be of interest to geologists using statistical methods. It includes the intuitive justification of the methodology, convincing through case studies and presenting user-friendly software, which includes a section for those who need to see the proof of the mathematical consistency of the methods used.
Contents
Compositional data and their analysis: an introduction, V Pawlowsky-Glahn and J J Egozcue
* Applications to the solution of real geological problems
* Major-oxide compositional discrimination in Cenozoic volcanites of Hungary, L Ó Kovács, G P Kovács, J A Martín-Fernández and C Barceló-Vidal
* Log-ratios and geochemical discrimination of Scottish Dalradian limestones: a case study, C W Thomas and J Aitchison
* Discriminating geodynamical regimes of tin ore formation using trace element composition of cassiterite: the Sikhote'Alin case (Far Eastern Russia), N Gorelikova, R Tolosana-Delgado, V Pawlowsky-Glahn, A Khanchuk and V Gonevchuk
* On stability of compositional canonical variate vector components, R A Reyment
* Compositional changes in a fumarolic field, Vulcano Island, Italy: a statistical case study, A Buccianti, F Tassi and O Vaselli
* Ternary sandstone composition and provenance: an evaluation of the 'Dickinson model', G J Weltje
* Software and related issues
* Detailed guide of CoDaPack: a freeware compositional software, S Thió-Henestrosa and J A Martín-Fernández
* Compositional data analysis with 'R' and the package 'compositions', K G van der Boogaart and R Tolosana-Delgado
* Visualization of three- and four- part (sub)compositions with 'R', M Bren and V Batagelj
* General theory and methods
* Simplicial geometry for compositional data, J J Egozcue and V Pawlowsky-Glahn
* Exploratory compositional data analysis, J Daunis-i-Estadella, C Barceló-Vidal and A Buccianti
* Frequency distributions and natural laws in geochemistry, A Buccianti, G Mateu-Figueras and V Pawlowsky-Glahn
* Rounded zeros: some practical aspects for compositional data, J A Martìn-Fernandez and S Thió-Henestrosa
* Is the simplex open or closed? (some topological concepts), E Barrabés and G Mateu-Figueras
Reviews
In conclusion, I highly recommend this very useful book to any geologist (or indeed any scientist) interested in how log-ratio methods can facilitate better statistical analysis
Published online: 26 October 2007 Springer-Verlag 2007
Stoch Environ Res Risk Assess (2008) 22:139-141
This review was submitted by:
John Bacon-Shone
21 July 2008
This book will be valuable for geoscientists and the statisticians who advise them.
Review featured in Journal of the Royal Statistical Society Series A, Vol 171, Part 1, 2008
This review was submitted by:
R.M. Lark, Rothamsted Research, Harpenden
25 July 2008
% end of stuff from GSL website
jverkuilen
You really need to read the Aitchison book Maarten cited. Get the 2003 edition, which has an extra chapter in the back.
How well a model like -dirifit- or the logistic-normal that Aitchison prefers can cope with boundaries is tricky. Much depends on the meaning of the zero or one observation. If it is due to rounding error, the procedure recommended by Aitchison is to shrink everything into the simplex uniformly by a small amount, eps. This turns out to work quite well and does very little damage to your estimates, though if you are not caeful Aitchison's model will end up with outliers if you pick a bad value of eps.
If it represents something qualitatively different than an observation of eps > 0, you have a bigger problem.
From: "Murali Kuchibhotla" <[email protected]>
Thank you Maarten. It turns out that the dependent variables that I am trying to
model(which are in the nature of proportions) take values which include 0 and 1.
So dirifit seems inappropriate for this particular application. In your
presentation however, you show that the fractional logit model can handle this
constraint for the single dependent variable case. Does this also hold when
modelling multiple dependent variables?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/