Ir al contenido

Documat


Resumen de Clustering in high dimension for multivariate and functional data using extreme kurtosis projections

Carolina Rendón Aguirre

  • Cluster analysis is a problem that consists of the analysis of the existence of clusters in a multivariate sample. This analysis is performed by algorithms that differ significantly in their notion of what constitutes a cluster and how to find them efficiently. In this thesis we are interested in large data problems and therefore we consider algorithms that use dimension reduction techniques for the identification of interesting structures in large data sets. Particularly in those algorithms that use the kurtosis coefficient to detect the clusters present in the data.

    The thesis extends the work of Peña and Prieto (2001a) of identifying clusters in multivariate data using the univariate projections of the sample data on the directions that minimize and maximize the kurtosis coefficient of the projected data, and Peña et al. (2010) who used the eigenvalues of a kurtosis matrix to reduce the dimension.

    This thesis has two main contributions:

    First, we prove that the extreme kurtosis projections have some optimality properties for mixtures of normal distributions and we propose an algorithm to identify clusters when the data dimension and the number of clusters present in the sample are high. The performance of the algorithm is analyzed through a simulations study and we compared it with the MCLUST, K-means and CLARA methods.

    Second, we propose the extension of multivariate kurtosis for functional data, and we analyze some of its properties for clustering. Additionally, we propose an algorithm based on kurtosis projections for functional data and we compared the results with Functional Principal Components, Functional K-means and FunClust.

    The thesis is structured as follows: Chapter 1 is an introductory chapter where we will review some theoretical concepts that will be used throughout the thesis.

    In chapter 2 we review in detail the concept of kurtosis. We study the kurtosis and the different interpretations that has been given to it in the literature. In addition, we give a detailed description of some algorithms proposed in the literature that use the kurtosis coefficient to detect the clusters present in the data.

    In Chapter 3 we study the directions that may be interesting to the detection of the different clusters in the sample and we analyze how the extreme kurtosis directions are related to these directions. In addition, we present a clustering algorithm for high-dimensional data using extreme kurtosis directions.

    In chapter 4 we introduce an extension of the multivariate kurtosis for the functional data and we analyze if the properties of the kurtosis regarding the identification of clusters are preserved. In addition, we present a clustering algorithm for functional data using extreme kurtosis directions.

    We finish with some remarks and conclusions in a concluding chapter.


Fundación Dialnet

Mi Documat