Ir al contenido

Documat


Modelling count data using the logratio-normal-multinomial distribution

  • Marc Comas-Cufí [1] ; Josep Antoni Martín-Fernández [1] ; Glòria Mateu-Figueras [1] ; Javier Palarea-Albaladejo [1]
    1. [1] Universitat de Girona

      Universitat de Girona

      Gerona, España

  • Localización: Sort: Statistics and Operations Research Transactions, ISSN 1696-2281, Vol. 44, Nº. 1, 2020, págs. 99-126
  • Idioma: inglés
  • DOI: 10.2436/20.8080.02.96
  • Enlaces
  • Resumen
    • The logratio-normal-multinomial distribution is a count data model resulting from compounding a multinomial distribution for the counts with a multivariate logratio-normal distribution for the multinomial event probabilities. However, the logratio-normal-multinomial probability mass function does not admit a closed form expression and, consequently, numerical approximation is required for parameter estimation. In this work, different estimation approaches are introduced and evaluated. We concluded that estimation based on a quasi-Monte Carlo Expectation-Maximisation algorithm provides the best overall results. Building on this, the performances of the Dirichlet-multinomial and logratio-normal-multinomial models are compared through a number of examples using simulated and real count data.

  • Referencias bibliográficas
    • Aitchison, J. (1986). The statistical analysis of compositional data. Chapman and Hall, London (UK). Reprinted in 2003 by Blackburn Press.
    • Aitchison, J. and Shen, S. M. (1980). Logistic-normal distributions: Some properties and uses. Biometrika, 67, 261–272.
    • Aitchison, J. and Ho, C. H. (1989). The multivariate Poisson-Log Normal Distribution. Biometrika, 76, 643–653.
    • Banfield, J. and Raftery, A. E. (1993). Model-based Gaussian and Non-Gaussian Clustering. Biometrics, 49, 803–821.
    • Billheimer, D., Guttorp, P. and Fagan, W. F. (2001). Statistical Interpretation of Species Composition. Journal of the American Statistical...
    • Blei, D. M., and Lafferty, J. D. (2007). A Correlated Topic Model of Science. The Annuals of Applied Statistics, 1, 1–21.
    • Bouguila, N. (2008). Clustering of Count Data Using Generalized Dirichlet Multinomial Distributions. IEEE Transactions on Knowledge and Data...
    • Caflisch, R. E. (1998). Monte Carlo and quasi-Monte Carlo methods. Acta Numerica, 7, 1–49.
    • Chastin, S. F., Palarea-Albaladejo, J., Dontje, M. L. and Skelton, D. A. (2015). Combined effects of time spent in physical activity, sedentary...
    • Comas-Cufı́, M., Martı́n-Fernández, J. A. and Mateu-Figueras, G. (2016). Logratio methods in mixture models for compositional data sets....
    • Comas-Cufı́, M., Martı́n-Fernández, J. A. and Mateu-Figueras, G. (2019). Merging the components of a finite mixture using posterior...
    • Connor, R. J. and Mosimann, J. E. (1969). Concepts of independence for proportions with a generalization of the Dirichlet Distribution. Journal...
    • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal...
    • Drmota, M. and Tichy, R. F. (1997). Sequences, Discrepancies and Applications. Lecture Notes in Mathematics, vol. 1651. Springer, Berlin (1997).
    • Edjabou, M. E., Martı́n-Fernández, J. A., Scheutz, C. and Astrup, T. F. (2017). Statistical analysis of solid waste composition data:...
    • Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G. and Barceló-Vidal, C. (2003). Isometric logratio transformations for compositional...
    • Graffelman, J. (2015). Exploring Diallelic Genetic Markers: The HardyWeinberg Package Journal of Statistical Software, 64.
    • Graffelman, J. and Weir, B. S. (2016). Testing for Hardy-Weinberg equilibrium at biallelic genetic markers on the X chromosome Heredity, 116,...
    • Grantham, N. S., Guan, Y., Reich, B. J., Borer, E. T., and Gross, K. (2019). MIMIX: A Bayesian MixedEffects Model for Microbiome Data From...
    • Holmes, I., Harris, K. and Quince, C. (2012) Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics PLOS ONE, 7, e30126.
    • Hughes, G., Munkvold, G. P. and Samita, S. (1998). Application of the logistic-normal-binomial distribution to the analysis of Eutypa dieback...
    • Jank, W. (2005) Quasi-Monte Carlo sampling to improve the effciency of Monte Carlo EM Computational Statistics & Data Anaylsis, 48, 685–701.
    • Johnson, N. L., Kotz, S. and Balakrishnan, N. (1997). Discrete Multivariate Distributions Series in probability and statistics. John Wiley...
    • Kuo, F. Y., Dunsmuir,W. T. M., Sloan, I. H., Wand, M. P. and Womersley, R. S. (2008). Quasi-Monte Carlo for Highly Structured Generalised...
    • Layton, D. F. and Siikamäki, J. (2009). Payments for ecosystem services programs: predicting landowner enrollment and opportunity cost using...
    • L’Ecuyer, P. and Lemieux, C. (2002). Recent advances in randomized quasi-Monte Carlo methods. InModeling Uncertainty: An Examination of Stochastic...
    • Leobacher, G. and Pillichshammer, F. (2014) Introduction to Quasi-Monte Carlo Integration and Applications. Compact Textbooks in Mathematics,...
    • Lindsay, B. G. (1995). Mixture Models: Theory, Geometry and Applications. Haywood, CA: Institute of Mathematical Sciences; Alexandria VA:...
    • Mandal, S., Van Treuren, W., White, R., Eggesbo, M., Knight, R. and Peddada, S. (2015) Analysis of composition of microbiomes: a novel method...
    • Martı́n-Fernández, J. A., Hron, K., Templ, M., Filzmoser, P. and Palarea-Albaladejo, J. (2015) Bayesianmultiplicative treatment of count...
    • Mateu-Figueras, G., Pawlowsky-Glahn, V. and Egozcue, J. J. (2011). The principle of working on coordinates. In Compositional Data Analysis,...
    • Mateu-Figueras, G., Pawlowsky-Glahn, V. and Egozcue, J. J. (2013). The normal distribution in some constrained sample spaces. SORT, 37, 29–56.
    • Minka, T. P. (2004). The Dirichlet-tree distribution https://www.microsoft.com/en-us/research/publication/ dirichlet-tree-distribution (last...
    • Morokoff, W. J. and Caflisch, R. E. (1995). Quasi-Monte Carlo Integration. Journal of Computational Physics, 122, 218–230.
    • Mosimann, J. E. (1962). On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika,...
    • Neal, R. M. (2010). MCMC Using Hamiltonian Dynamics. Handbook of Markov Chain Monte Carlo, 54, 113–162.
    • Neath, R. C. (2013). On convergence Properties of the Monte Carlo EM Algorithm. In Advances in Modern Statistical Theory and Applications:...
    • Nelson, J. F. (1985). Multivariate Gamma-Poisson Models. Journal of the American Statistical Association, 80, 828–834.
    • Ongaro, A. and Migliorati, S. (2013). A generalization of the Dirichlet distribution. Journal of Multivariate Analysis, 114, 412–426.
    • Owen, A. B. (1995) Randomly permuted (t,m,s)-nets and (t,s)-sequences In Monte Carlo and QuasiMonte Carlo Methods in Scientific Computing,...
    • Palarea-Albaladejo, J. and Martı́n-Fernández, J. A. (2008). A modified EM alr-algorithm for replacing rounded zeros in compositional data...
    • Palarea-Albaladejo J., Rooke J. A., Nevison, I. M. and Dewhurst, R. J. (2017). Compositional mixed modeling of methane emissions and ruminal...
    • Pan, J. and Thompson, R. (2007). Quasi-Monte Carlo estimation in generalized linear mixed models. Computational Statistics and Data Analysis,...
    • Pawlowsky-Glahn V and Egozcue JJ (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and...
    • Pinheiro, JC and Bates, DM (1996). Unconstrained parametrizations for variance-covariance matrices. Statistics and Computing, 6, 289–296.
    • R Development Core Team (2015). R: A language and environment for statistical computing. Vienna:
    • R Foundation for Statistical Computing, URL http://www.r-project.org (last accessed on 23 November 2017).
    • Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. The Annals of Mathematical Statistics, 35, 1–20.
    • Robbins, H. (1980). Estimation and prediction for mixtures of the exponential distribution. Proceedings of the National Academy of Sciences,...
    • Scheffé, H. (1958) Experiments with mixtures. Journal of the Royal Statistical Society, series B (Methodological), 20, 344–360.
    • Silverman, J. D., Durand, H. K., Bloom, R. J., Mukherjee, S. and David, L. A. (2018) Dynamic linear models guide design and analysis of microbiota...
    • Silverman, J. D., Roche, K., Zachary, C. H., David, L. A. and Mukherjee, S. (2019) Bayesian Multinomial Logistic Normal Models through Marginally...
    • Wang, X. and Fang K. T. (2003). The effective dimension and quasi-Monte Carlo integration Journal of Complexity, 19, 101–124.
    • Xia, M., Chen, J., Fung K. F. and Li H. (2013). A Logistic Normal Multinomial Regression Model for Microbiome Compositional Data Analysis....

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno