Ir al contenido

Documat


Resumen de Explainability for machine learning

Pau Figuera

  • español

    El leitmotiv de esta Tesis es la búsqueda de interpretaciones con contenido explicable para Machine Learning. La explicabilidad la interpretamos como la fundamentación de los métodos desarrollados sobre técnicas algebraicas y estadísticas sólidas. El punto de partida es la relación entre el Probabilistic Latent Semantic Analysis y el Teorema de Descomposición en Valores Singulares. El trabajo se basa en la interpretación de la estructura de la dimensionalidad del espacio de factorización.

    Con estas condiciones, la búsqueda del significado de la matriz diagonal se relaciona con el kernel de Fisher. El álgebra de las matrices de entradas no negativas soporta estas estructuras de forma natural. El resultado que derivamos es que este kernel puede obtenerse de esta forma. Con la divergencia de Bregman demostramos que el error de clasificación es arbitrariamente pequeño, preservando la consistencia.

    Una consecuencia que examinamos es el comportamiento asintótico de las trazas que se obtienen con estas matrices. Su esperanza es un estadístico modelado por una densidad que obedece a una densidad gamma. La estimación es eficiente. Aplicamos este resultado al problema de clustering, lo que permite construir un criterio de validación. El resultado novedoso es que permite la inferencia (en sentido estadístico) en la validación de la clusterización.

    Se presentan los desarrollos teóricos que nos permiten llegar a cada conclusión. Además, proporcionamos ejemplos de aplicación para los resultados que hemos derivado.

  • English

    The leitmotif of this thesis is the search for interpretations with explainable content for Machine Learning. We interpret explainability using well-sound algebraic and statistics techniques. Our starting point is a probabilistic interpretation of the Singular Value Decomposition Theorem. This Theorem is fundamental in many branches of pure and applied mathematics, as well as statistics, and it is the starting point for a vast series of concepts and techniques that have given rise to countless algorithms in Machine Learning. Initially, we consider the relationship between Probabilistic Latent Semantic Analysis (in its symmetrical formulation) and the Singular Value Decomposition Theorem. This analogy was introduced by Hofmann and studied by other authors, establishing it as a merely formal relationship without quantitative content. Our work is based on interpreting the dimension of the space of its factorization, since the latent or hidden variables are especially relevant.

    The first result we obtain is the conditions for equivalence: the orthonormalization of a data matrix transformed to the space of probabilities is the decomposition in singular values. This result requires that the dimensionality of the representation space of the non-negative matrix be greater than or equal to the dimensionality of the data matrix. With these conditions, the search for the significance of the diagonal matrix, which is related to inertia, leads us to Fisher’s kernel. The Fisher kernel covariance matrix is a diagonal matrix, and it is a measure of information, being its reciprocal covariance matrix. The non-negative input matrix algebra inherently supports these data structures. The result we derive is that the weight matrix kernel obtained by factorizing the non-negative matrices product is the Fisher information matrix. generalizes several types. In the limit case, it is the geodesic distance.

    Furthermore, each choice leads to the desired properties for the results. Also, this kernel provides another property: the margins are flat structures, so the misclassification is arbitrarily small. One consequence we examine is the asymptotic behavior of the sequence of traces obtained with these matrices: its expectation is a statistic modeled by a density that obeys a gamma. The estimation reaches the Cr´amerRao bound, so it is efficient. Furthermore, it is the posterior of a Poisson distribution. We apply this result to the clustering problem, which allows us to build a validation criterion. The novel result is that inference for clustering validation is possible. Several examples demonstrate that the maximums of our densities are very similar to those obtained with the Silhouette and Gap statistic indices. The statement is non-parametric and induces a metric, just as in the parametric case.

    Parameterization implies the existence of a differentiable manifold in the parameter space. In this case, parameters are the data matrices and/or vectors (their product allows reconstruction of the original data with no loss of information and are, therefore, non-sufficient minimum statistics).


Fundación Dialnet

Mi Documat