Ir al contenido

Documat


Log-ratio methods in mixture models for compositional data sets

  • M. Comas-Cufí [1] ; J.A. Martín-Fernández [1] ; G. Mateu-Figueras [1]
    1. [1] Universitat de Girona

      Universitat de Girona

      Gerona, España

  • Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 40, Nº. 2, 2016, págs. 349-374
  • Idioma: inglés
  • Enlaces
  • Resumen
    • When traditional methods are applied to compositional data misleading and incoherent results could be obtained. Finite mixtures of multivariate distributions are becoming increasingly important nowadays. In this paper, traditional strategies to fit a mixture model into compositional data sets are revisited and the major difficulties are detailed. A new proposal using a mixture of distributions defined on orthonormal log-ratio coordinates is introduced. A real data set analysis is presented to illustrate and compare the different methodologies.

  • Referencias bibliográficas
    • Aitchison, J. (1982). The statistical analysis of compositional data (with discussion). Journal of the Royal Statistical Society: Series B...
    • Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman and Hall, London (UK). Reprinted in 2003 by Blackburn Press.
    • Aitchison, J. (1999). Logratios and natural laws in compositional data analysis. Mathematical Geology, 31, 563–580.
    • Aitchison, J. (2002). Simplicial inference. In Algebraic Methods in Statistics and Probability (ed. VianaMA and Richards DS), vol 287. Contemporary...
    • Albert, J. H. and Gupta, A. K. (1982). Mixtures of Dirichlet distributions and estimation in contingency tables. The Annals of Statistics,...
    • Andrews, J. L. and McNicholas, P. D. (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate...
    • Azzalini, A. and Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical...
    • Azzalini, A. and Capitanio, A. (2003). Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution....
    • Bache, K. and Lichman, M. (2013). UCI machine learning repository. [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,...
    • Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal...
    • Banfield, J. D. and Raftery, A. E. (1993). Model-based gaussian and non-gaussian clustering. Biometrics, 49, 803–821.
    • Barceló-Vidal, C., Martı́n-Fernández, J. A., and Pawlowsky-Glahn, V. (1999). Comment on “Singularity and nonnormality in the classification...
    • Bickel, S. and Scheffer, T. (2004). Multi-view clustering. In Rastogi, R., Morik, K., Bramer, M., and Wu, X., editors, ICDM 2004, fourth IEEE...
    • Bouguila, N. (2011). Count data modeling and classification using finite mixtures of distributions. IEEE Transactions on Neural Networks,...
    • Bouguila, N., Ziou, D. and Vaillancourt, J. (2004). Unsupervised learning of a finite mixture model based on the Dirichlet distribution and...
    • Bouveyron, C. and Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Computational Statistics and Data...
    • Browne, R. P. and McNicholas, P. D. (2013). A mixture of generalized hyperbolic distributions. ArXiv e-prints arXiv:1305.1036
    • Buccianti, A. (2011). Natural Laws Governing the Distribution of the Elements in Geochemistry: The Role of the Log-Ratio Approach, 255–266....
    • Calif, R., Emiliol, R. and Soubdhan, T. (2011). Classification of wind speed distributions using a mixture of Dirichlet distributions. Renewable...
    • Celeux, G. and Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics &...
    • Connor, R. J. and Mosimann, J. E. (1969). Concepts of independence for proportions with a generalization of the Dirichlet distribution. Journal...
    • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal...
    • Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional...
    • Egozcue, J. J. and Pawlowsky-Glahn, V. (2005). Groups of Parts and Their Balances in Compositional Data Analysis. Mathematical Geology, 37,...
    • Eilers, P.H.C., Marx, B.D. and Durban, M. (2015). Twenty years of P-splines. SORT, 39, 149–186.
    • Ferrer-Rosell, B., Coenders, G., and Martı́nez-Garcı́a, E. (in press). Segmentation by tourist expenditure composition. An approach with...
    • Giordan, M. and Wehrens, R. (2015). A comparison of computational approaches for maximum likelihood estimation of the Dirichlet parameters...
    • Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica, 50, 1029–1054.
    • Hubert, L. and Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
    • Lee, S. X. and McLachlan, G. J. (2011). On the fitting of mixtures of multivariate skew t-distributions via the EM algorithm. ArXiv e-prints...
    • Lee, S. X. and McLachlan, G. J. (2014). Finite mixtures of multivariate skew t-distributions: some recent and new results. Statistics and...
    • Lin, T. I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20, 343–356.
    • Mardia, K. V., Taylor, C. C. and Subramaniam, G. K. (2007). Protein bioinformatics and mixtures of bivariate von Mises distributions for angular...
    • Martı́n-Fernández, J. A., Daunis-i-Estadella, J. and Mateu-Figueras, G. (2015). On the interpretation of differences between groups for...
    • Mateu-Figueras, G. and Pawlowsky-Glahn, V. (2007). The skew-normal distribution on the simplex. Communications in Statistics-Theory and Methods,...
    • Mateu-Figueras, G., Pawlowsky-Glahn, V. and Egozcue, J. J. (2011). The principle of working on coordinates. In Compositional Data Analysis,...
    • Mateu-Figueras, G., Pawlowsky-Glahn, V. and Egozcue, J. J. (2013). The normal distribution in some constrained sample spaces. SORT, 37, 29–56.
    • McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models, Willey Series in Probability and Statistics. John Wiley and Sons, New York.
    • Monti, G. S., Mateu-Figueras, G. and Pawlowsky-Glahn, V. (2011). Notes on the scaled Dirichlet distribution. In Compositional Data Analysis,...
    • Monti, G. S., Mateu-Figueras, G., Pawlowsky-Glahn, V., and Egozcue, J. J. (2011). The shifted-scaled Dirichlet distribution in the simplex....
    • Narayanan, A. (1991). Algorithm AS 266: maximum likelihood estimation of the parameters of the Dirichlet distribution. Journal of the Royal...
    • Ng, K. W., Tian, G.-L. and Tang, M.-L. (2011). Dirichlet and Related Distributions: Theory, Methods and Applications. John Wiley and Sons.
    • Neocleous, T., Aitken, C. and Zadora, G. (2011). Transformations for compositional data with zeros with an application to forensic evidence...
    • Ongaro, A. and Migliorati, S. (2013). A generalization of the Dirichlet distribution. Journal of Multivariate Analysis, 114, 412–426.
    • Palarea-Albaladejo, J., Martı́n-Fernández, J. A., and Buccianti, A. (2014). Compositional methods for estimating elemental concentrations...
    • Palarea-Albaladejo, J., Martı́n-Fernández, J. A., and Soto, J. A. (2012). Dealing with distances and transformations for fuzzy C-means...
    • Papageorgiou, I., Baxter, M. J. and Cau, M. A. (2001). Model-based cluster analysis of artefact compositional data. Archaeometry, 43, 571–588.
    • Pawlowsky-Glahn, V. and Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research...
    • Prates, M. O., Lachos, V. H. and Cabral, C. R. B. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal...
    • R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for statistical computing, Vienna, Austria.
    • Rayens, W. S. and Srinivasan, C. (1994). Dependence properties of generalized Liouville distributions on the Simplex. Journal of the American...
    • Reimann, C., Filzmoser, P., Garrett, R., and Dutter, R. (2011). Statistical Data Analysis Explained: Applied Environmental Statistics with...
    • Scealy, J. L., Patrice de Caritat, Grunsky, E. C., Tsagris, M. T and Welsh, A. H. (2015). Robust principal component analysis for power transformed...
    • Scott, A. and Symons, M. (1971). Clustering methods based on likelihood ratio criteria. Biometrics, 27, 387–397.
    • Smith, B. and Rayens, W. (2002). Conditional generalized Liouville distributions on the simplex. Statistics, 36, 185–194.
    • Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. Springer-Verlag, New York.
    • Vives-Mestres, M., Daunis-i Estadella, J. and Mártin-Fernández, J. A. (2014). Individual T-2 control chart for compositional data. Journal...
    • Wang, H., Liu, Q., Mok, H. M. K., Fu, L. and Tse, W. M. (2007). A hyperspherical transformation forecasting model for compositional data....

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno