Ir al contenido

Documat


Automated learning of mixtures of factor analysis models with missing information

  • Wan-Lun Wang [2] ; Tsung-I Lin [1]
    1. [1] National Chung Hsing University

      National Chung Hsing University

      Taiwán

    2. [2] Department of Statistics, Graduate Institute of Statistics and Actuarial Science, Feng Chia University, Taichung, 40724, Taiwan
  • Localización: Test: An Official Journal of the Spanish Society of Statistics and Operations Research, ISSN-e 1863-8260, ISSN 1133-0686, Vol. 29, Nº. 4, 2020, págs. 1098-1124
  • Idioma: inglés
  • DOI: 10.1007/s11749-020-00702-6
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • The mixture of factor analyzers (MFA) model has emerged as a useful tool to perform dimensionality reduction and model-based clustering for heterogeneous data. In seeking the most appropriate number of factors (q) of a MFA model with the number of components (g) fixed a priori, a two-stage procedure is commonly implemented by firstly carrying out parameter estimation over a set of prespecified numbers of factors, and then selecting the best q according to certain penalized likelihood criteria. When the dimensionality of data grows higher, such a procedure can be computationally prohibitive. To overcome this obstacle, we develop an automated learning scheme, called the automated MFA (AMFA) algorithm, to effectively merge parameter estimation and selection of q into a one-stage algorithm. The proposed AMFA procedure that allows for much lower computational cost is also extended to accommodate missing values. Moreover, we explicitly derive the score vector and the empirical information matrix for calculating standard errors associated with the estimated parameters. The potential and applicability of the proposed method are demonstrated through a number of real datasets with genuine and synthetic missing values.

  • Referencias bibliográficas
    • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49:803–821
    • Basford KE, Greenway DR, McLachlan GJ, Peel D (1977) Standard errors of fitted means under normal mixture models. Comput Stat 12:1–17
    • Biernacki C, Celeux G, Govaert G (2000) Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans Pattern...
    • Boldea O, Magnus JR (2009) Maximum likelihood estimation of the multivariate normal mixture model. J Am Stat Assoc 104:1539–1549
    • Cramér H (1946) Mathematical methods of statistics. Princeton University Press, Princeton
    • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc B 39:1–38
    • Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat...
    • Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, London
    • Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588
    • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–612
    • Frank A, Asuncion A (2010) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine,...
    • Ghahramani Z, Beal MJ (2000) Variational inference for Bayesian mixture of factor analysers. In: Solla S, Leen T, Muller K-R (eds) Advances...
    • Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesarro G, Alspector J (eds) Advances...
    • Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers. Technical report no. CRG-TR-96-1, University of Toronto
    • Golub GH, Van Loan CF (1989) Matrix computations, 2nd edn. Johns Hopkins University Press, Baltimore, MD
    • Ibrahim JG, Zhu H, Tang N (2008) Model selection criteria for missing data problems via the EM algorithm. J Am Stat Assoc 103:1648–1658
    • Keribin C (2000) Consistent estimation of the order of mixture models. Sankhya 62:49–66
    • Lattin J, Carrol JD, Green PE (2003) Analyzing multivariate data. Brooks/Cole, Pacific Grove, CA
    • Ledermann W (1937) On the rank of the reduced correlational matrix in multiple-factor analysis. Psychometrika 2:85–93
    • Lin TI, Lee JC, Ho HJ (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recogn 39:1177–1187
    • Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
    • McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
    • McLachlan GJ, Bean RW, Peel D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics 18:413–422
    • McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41:379–388
    • McLachlan GJ, Bean RW, Jones LBT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution....
    • Meng XL, van Dyk D (1997) The EM algorithm—an old folk-song sung to a fast new tune. J Roy Stat Soc B 59:511–567
    • Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
    • Montanari A, Viroli C (2011) Maximum likelihood estimation of mixtures of factor analyzers. Comput Stat Data Anal 55:2712–2723
    • Redner RA, Walker HF (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26:195–239
    • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
    • Stephens M (2000) Dealing with label switching in mixture models. J Roy Stat Soc B 62:795–809
    • Ueda N, Nakano R, Ghahramani Z, Hinton GE (2000) SMEM algorithm for mixture models. Neural Comput 12:2109–2128
    • Wang WL, Lin TI (2013) An efficient ECM algorithm for maximum likelihood estimation in mixtures of t-factor analyzers. Comput Stat 28:751–769
    • Wang WL, Lin TI (2015) Robust model-based clustering via mixtures of skew-t distributions with missing information. Adv Data Anal Classif...
    • Zhang K, Fan W (2008) Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowl Inf Syst 14:299–326
    • Zhao JH, Shi L (2014) Automated learning of factor analysis with complete and incomplete data. Comput Stat Data Anal 72:205–218
    • Zhao JH, Yu PLH (2008) Fast ML estimation for the mixture of factor analyzers via an ECM algorithm. IEEE Trans Neural Netw 19:1956–1961
    • Zhao JH, Yu PLH, Jiang Q (2008) ML estimation for factor analysis: EM or non-EM? Stat Comput 18:109–123

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno