Transcript 0.3
CoDaPack: A tool for Compositional Data Analysis M. Comas-Cufí & S. Thió-Henestrosa ([email protected]) Dept. Computer Sciences and Applied Mathematics University of Girona (UdG) Catalonia-Spain 1 What’s coda? • Vector x=[x1, x2,…, xD] • Add to a constant: 100, 1, 106, 109, … Units: percentage, part per one, ppm, ppb, … • Has positive elements • Carry only relative information • Examples – Production (pieces): [Ok, NonOk, Rework] = [87, 1, 12] – Household budget (€): [Food, Serv., Other] = [1150, 623, 351] – Daily activities (h): [Work, Sleep, Other] = [7.5, 7.5, 9] 2 Sample space of coda: simplex • Compositional data live in the simplex (S) represented in ternary (D=3), quaternary (D=4), … diagram D=3 S3 x = [0.45,0.35,0.2] x=[0.2,0.25,0.2 ,0.35] D=4 S4 3 Euclidean distance appropriate? B A STOP PROD. HALF PROD. NON-STOP PROD. A2009 = [0.2, 0.1, 0.7] A2010 = [0.1, 0.2, 0.7] STOP PROD. HALF PROD. NON-STOP PROD. B2009 = [0.4, 0.3, 0.3] B2010 = [0.3, 0.4, 0.3] A2010 - A2009 = B2010 - B2009 = [-0.1, 0.1, 0] de(A)=de(B)=0.14 measures the absolute difference 4 Euclidean distance appropriate? B A STOP PROD. HALF PROD. NON-STOP PROD. 0.1 0.2 0.2 2009 0.1 0.7 2010 0.1 0.2 0.7 2009 2010 STOP PROD. HALF PROD. 0.4 0.3 0.4 0.3 0.3 Factory A 0.4 0.3 0.3 Factory B Stop Prod -50% -25% Half Prod +100% +33.3% 0% 0% Non-Stop Prod NON-STOP PROD. 5 Euclidean distance appropriate? STOP PROD. Our interest lies on relative values A2010/A2009=[1/2, 2, 1] B2010/B2009=[3/4, 4/3, 1] Euclidian distance: de(A) = de(B) = 0.14 B2009 B2010 A2009 A2010 HALF PROD. NON-STOP PROD. Aitchison distance: da(A)=0.6276 da(B) = 0.3970 6 Classical multivariate normal model appropriate? 7 Log-ratio methodology • Aitchison geometry to CODA is equivalent to classical euclidean geometry to log-ratio values. Simplex (restricted space) [x1,…,xD] Real space (non restricted) log(xi/xj), i,j = 1,…,D, j ≠ i 8 CoDaPack 2 9 Software • CoDaPack: software developed by the Departament of Computer Science and Applied Mathematics in the Universitat de Girona. Easy and intuitive. http://ima.udg.edu/codapack [email protected] • compositions (R-package): analysis of compositional and positive data using different approaches. http://cran.r-project.org/ [email protected] • robCompositions (R-package): robust estimation for compositional data http://cran.r-project.org/ [email protected] 10 References • Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Chapman & Hall, London. Reprinted in 2003 with additional material byBlackburn Press. • Proceedings of CoDaWork, 2003-2005-2008-2011: available in http://dugi-doc.udg.edu/handle/10256/150. • CoDaWeb: Compositional Data http://www.compositionaldata.com/ Analysis Web Site: 11