David Atienza González
Nowadays, machine learning has become an important tool to create models from the large amount of available data. These models are usually useful to solve many different tasks such as classification, clustering, probability density estimation, anomaly detection, etc.
This thesis is primarily concerned with dealing with uncertainty, which is usually present in data. Analyzing this uncertainty can be helpful to better understand the proccess under study. A commonly used technique is to estimate the underlying probability distribution that generated the data, which is unknown for most real world data. This estimation can be performed with two different types of models: parametric and nonparametric. Parametric models make assumptions about the class of the underlying probability distribution and the objective is to find the best parameter values that provide the best fit to the data. In contrast, nonparametric models alleviate the assumptions on the underlying probability distribution, and generate the estimate direclty from data. However, nonparametric models do not provide a good performance when dealing with high-dimensional data, a problem often referred to as the curse of dimensionality in the literature.
Bayesian networks are a probabilistic graphical model that factorizes a joint probability distribution into the product of multiple conditional probability distributions, taking advantage of the conditional independences in the probability distribution. This is helpful for converting the estimation of a high-dimensional probability distribution into the estimation of several low-dimensional conditional probability distributions. Thus, in this thesis we propose the class of semiparametric Bayesian networks, which model the low-dimensional conditional probability distributions using either parametric or nonparametric models. This novel class of Bayesian networks generalizes two common classes of Bayesian networks in the state of the art. Moreover, the semiparametric Bayesian networks can be learned using an adaptation of standard learning algorithms for Bayesian networks. In addition, an extension to semiparametric Bayesian networks is proposed which can model hybrid data containing both discrete and continuous data.
Anomaly detection is the proccess of detecting events that differ significantly from the normal behavior of the system. This task is often approached by detecting low-probability events, since anomalies are rare. This has many applications, particularly in industry where errors in production must be identified as early as possible. In this thesis, we perform anomaly detection in a real laser heat-treatment process used in the automative industry. Two different approaches are proposed to detect anomalies. In the first approach, the laser movement is tracked, so the source high-dimensional data is transformed into low-dimensional data. Then, a grid of nonparametric models is used to detect anomalies. The second approach models the source high-dimensional data using semiparametric Bayesian networks. Both approaches take into account the temporal characteristics of the data and exhibit promising capabilities to detect anomalies.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados