Ir al contenido

Documat


Resumen de Depth-based methods for functional data analysis

Antonio Elías Fernández

  • In the last few decades, technological advancements have simplified and decreased the cost of data collecting and storing processes. We live in an expanding data-driven world where almost any aspect is actively real-time sensed. Devices are able to transfer data, communicate each other and make decisions without requiring any human interaction or supervision. Smartphones, weareables, social networks, weather stations, air quality sensors, traffic flow and Global Positioning Systems are only some sources of data falling under the Internet-of-Things era. This new paradigm has made statisticians and practitioners to move from a poor-world-data framework to confront not only big data sets but complex data structures (Ranjan et al., 2018; Galeano and Peña, 2019).

    Functional data analysis (FDA) arises naturally in this context to exploit the information that is recorded over a continuum such as time or space. Formally, a functional data sample X_1(t), …, X_n(t) is composed by n independent and identically distributed functions observed for all t in [a,b]. FDA is becoming common in practise and, in fact, this thesis approaches the analysis of various functional data sets coming from very different sources and areas of research. In particular, daily electricity supply functions, daily electricity demand functions, medical imaging of the internal carotid artery, daily nitrogen monoxide emissions, spatio-temporal maximum temperatures and micro level spatio-temporal mortality rates. The literature has experienced a prominent development of techniques and tools to take advantage of these kind of data. The manuals by Ramsay and Silverman (2005) and Ferraty and Vieu (2006) provided an overview of the classical methods for functional data analysis.

    In particular, this thesis is focused on the study of functional depth measures and their applications. Chapter 2 presents a literature review that covers from the fundamentals of the concept to the current available applications, paying more attention to methods that are applied or extended in future chapters of the thesis.

    As it is shown, the concept of depth has received a great deal of attention in non-parametric statistics and, particularly, in FDA. Many different definitions of functional depths have been proposed and the literature has revealed a strong interest about their theoretical properties towards a consistent formalization of the concept. Depth measures have motivated depth-based methodologies that have demonstrated to be remarkable tools in visualization, outlier detection, classification, non-parametric testing and clustering.

    This thesis broadens the list of problems that one could approach with depth-based methods. Specifically, Chapter 4 addresses the problem of forecasting functional time series. In this context, our functional realizations X_1(t), …, X_n(t) are not independent anymore, in fact, in time series the goal is to study the temporal dependency. Formally, our realizations are indexed in time X_i, i = 1,…,n where each X_i is a random function.

    It is becoming recurrent in real applications that the available functions are systematically incomplete or partially observed. Sources of censoring are, for example, patients missing medical visits or sensor failing to record in meteorological or air quality data. This means that each function X_i(t) is not observed for all t in [a,b] but only in subsets of [a,b], invalidating many of the existing methodologies for FDA.

    Partially observed functional data also hampers the applicability of depth-based methodologies since currently available depth measures are not suitable for such context. Chapter 3 introduces, at the best of our knowledge, the first functional depth measures for partially observed functional data. In addition, as by-products of our proposal, the depth-based methodologies can be applied to partially observed functional data increasing the available palette of tools for such challenging data sets. Chapter 3 illustrates how to deal with visualization, outlier detection and classification problems in simulated and real partially observed functional data sets.

    Finally, the methodology for forecasting the functional time series of Chapter 4 and the depth measure for partially observed data of Chapter 3 provide us the resources to develop a depth-based method for partially observed functional data reconstruction. The literature in functional data analysis has recently tackled the problem of estimating the missing parts, however, the existing reconstruction methods rely strongly on a proper estimation of the mean and the covariance function. Unfortunately, this task is sometimes difficult due to the poor availability of data, the presence of complex covariance structures or the sharp nature of the functions. Aiming to provide a satisfactory solution in such cases, Chapter 5 introduces a depth-based reconstruction method that does not require the estimation of the covariance and is particularly useful for non-smooth functions.

    Finally, Chapter 6 presents the main conclusions of the thesis.


Fundación Dialnet

Mi Documat