Ir al contenido

Documat


Robust model-based clustering with mild and gross outliers

  • Alessio Farcomeni [2] ; Antonio Punzo [1]
    1. [1] University of Catania

      University of Catania

      Catania, Italia

    2. [2] Department of Economics and Finance, University of Rome “Tor Vergata”, Via Columbia 2, 00133, Rome, Italy
  • Localización: Test: An Official Journal of the Spanish Society of Statistics and Operations Research, ISSN-e 1863-8260, ISSN 1133-0686, Vol. 29, Nº. 4, 2020, págs. 989-1007
  • Idioma: inglés
  • DOI: 10.1007/s11749-019-00693-z
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • We propose a model-based clustering procedure where each component can take into account cluster-specific mild outliers through a flexible distributional assumption, and a proportion of observations is additionally trimmed. We propose a penalized likelihood approach for estimation and selection of the proportions of mild and gross outliers. A theoretically grounded penalty parameter is then obtained. Simulation studies illustrate the advantages of our procedure over flexible mixtures without trimming, and over trimmed normal mixture models (tclust). We conclude with an original real data example on the identification of the source from illicit drug shipments seized in Italy and Spain. The methodology proposed in this paper has been implemented in R functions which can be downloaded from https://github.com/afarcome/cntclust.

  • Referencias bibliográficas
    • Aitkin M, Wilson GT (1980) Mixture models, outliers, and the EM algorithm. Technometrics 22(3):325–331
    • Andrews J, Wickins J, Boers N, McNicholas P (2018) teigen: an R package for model-based clustering and classification via the multivariate...
    • Atkinson AC, Riani M, Cerioli A (2018) Cluster detection and clustering with random start forward searches. J Appl Stat 45(5):777–798
    • Bagnato L, Punzo A, Zoia MG (2017) The multivariate leptokurtic-normal distribution and its application in model-based clustering. Can J Stat...
    • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3):803–821
    • Bryant P (1991) Large-sample results for optimization-based clustering methods. J Classif 8:31–44
    • Cabral CSB, Lachos VH, Prates MO (2012) Multivariate mixture modelling using skew-normal independent distributions. Comput Stat Data Anal...
    • Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105:147–156
    • Cerioli A, Farcomeni A, Riani M (2019) Wild adaptive trimming for robust estimation and cluster analysis. Scand J Stat 46:235–256
    • Cerioli A, García-Escudero LA, Mayo-Iscar A, Riani M (2018) Finding the number of normal groups in model-based clustering via constrained...
    • Cerioli A, Riani M, Atkinson AC, Corbellini A (2018) The power of monitoring: how to make the most of a contaminated multivariate sample....
    • Coretto P, Hennig C (2016) Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust Gaussian...
    • Dang UJ, Browne RP, McNicholas PD (2015) Mixtures of multivariate power exponential distributions. Biometrics 71(4):1081–1089
    • Di Zio M, Guarnera U, Rocci R (2007) A mixture of mixture models for a classification problem: the unity measure error. Comput Stat Data Anal...
    • Dotto F, Farcomeni A (2019) Robust inference for parsimonious model-based clustering. J Stat Comput Simul 89:414–442
    • Dotto F, Farcomeni A, Garcia-Escudero LA, Mayo-Iscar A (2017) A fuzzy approach to robust regression clustering. Adv Data Anal Classif 11:691–710
    • Dotto F, Farcomeni A, Garcia-Escudero LA, Mayo-Iscar A (2018) A reweighting approach to robust clustering. Stat Comput 28:477–493
    • Doukan P (1994) Mixing, vol 85. Lectures notes in statistics. Springer, Berlin
    • Embrechts P, Klüppelberg C, Mikosch T (2008) Modelling extremal events for insurance and finance. Springer, New York
    • Esary J, Proschan F, Walkup D (1967) Association of random variables, with applications. Ann Math Stat 38:1466–1474
    • Farcomeni A (2007) Some results on the control of the false discovery rate under dependence. Scand J Stat 34:275–297
    • Farcomeni A (2009) Robust double clustering: a method based on alternating concentration steps. J Classif 26:77–101
    • Farcomeni A (2014) Robust constrained clustering in presence of entry-wise outliers. Technometrics 56:102–111
    • Farcomeni A, Dotto F (2018) The power of (extended) monitoring in robust clustering. Stat Methods Appl 27:651–660
    • Farcomeni A, Greco L (2015) Robust methods for data reduction. CRC Press, Boca Raton
    • Franczak BC, Browne RP, McNicholas PD (2014) Mixtures of shifted asymmetric Laplace distributions. IEEE Trans Pattern Anal Mach Intell 36(6):1149–1157
    • Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33(1):347–380
    • García-Escudero L, Gordaliza A, Matrán C, Mayo-Iscar A (2011) Exploring the number of groups in robust model-based clustering. Stat Comput...
    • García-Escudero LA, Gordaliza A, Matran C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36:1324–1345
    • García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2010) A review of robust clustering methods. Adv Data Anal Classif 4:89–109
    • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
    • Li J (2004) Clustering based on a multilayer mixture model. J Comput Gr Stat 14:547–568
    • Lin TI (2009) Maximum likelihood estimation for multivariate skew normal mixture models. J Multivar Anal 100(2):257–265
    • Mazza A, Punzo A (2017) Mixtures of multivariate contaminated normal regression models. Stat Pap. https://doi.org/10.1007/s00362-017-0964-y
    • Meng X-L, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278
    • Morris K, Punzo A, McNicholas PD, Browne RP (2019) Asymmetric clusters and outliers: mixtures of multivariate contaminated shifted asymmetric...
    • Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348
    • Punzo A, Blostein M, McNicholas PD (2020) High-dimensional unsupervised classification via parsimonious contaminated mixtures. Pattern Recognit...
    • Punzo A, Mazza A, McNicholas P (2018) ContaminatedMixt: an R package for fitting parsimonious mixtures of multivariate contaminated normal...
    • Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biomet J 58(6):1506–1537
    • Punzo A, McNicholas PD (2017) Robust clustering in regression analysis via the contaminated Gaussian cluster-weighted model. J Classif 34(2):249–293
    • Riani M, Atkinson AC, Cerioli A, Corbellini A (2019) Efficient robust methods via monitoring for clustering and multivariate data analysis....
    • Ritter G (2015) Robust cluster analysis and variable selection. CRC Press, Boca Raton
    • Ruwet C, García-Escudero LA, Gordaliza A, Mayo-Iscar A (2013) On the breakdown behavior of the tclust clustering procedure. Test 22(3):466–487
    • Schott JR (2016) Matrix analysis for statistics. Wiley series in probability and statistics, Wiley, Hoboken
    • Stephens M (2000) Dealing with label switching in mixture models. J R Stat Soc Ser B Stat Methodol 62(4):795–809
    • Tukey JW (1960) A survey of sampling from contaminated distributions. In: Olkin I (ed) Contributions to probability and statistics: essays...
    • Zhang J, Liang F (2010) Robust clustering using exponential power mixtures. Biometrics 66(4):1078–1086

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno