Publication: Scalable Outlier Detection Methods for Functional Data
Loading...
Identifiers
Publication date
2022-10-11
Defense date
2022-11-30
Authors
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Recent technological advances have led to an exponential growth in the volume of data
generated. The quest to make sense of these data, some of which are usually complex,
has led to recent interest in development of statistical methods for analysing data with
complex structures. One such field of interest is functional data analysis (FDA), which
deals with the analysis of data that can be considered as functions, curves, or surfaces
observed over a domain set. Outlier detection is a challenging but important part of
the exploratory analysis process in FDA because functional observations can exhibit
outlyingness in various ways compared to the bulk of the data. This thesis addresses
the problem of detecting and classifying outliers in functional data with three main
contributions.
First, the fdaoutlier R package is presented in Chapter 2. The package contains
implementations of some of the state-of-the-art functional outlier detection methods
in the literature. Some of the methods implemented include directional outlyingness,
magnitude-shape plot, sequential transformations, total variation depth, and modified
shape similarity index. Detailed illustrations of the functions of the package are provided,
using various simulated and real functional datasets curated from the functional
outlier detection literature. Overviews of the functional outlier detection methods implemented
in the package are also presented in Chapter 2. This chapter therefore, serves
as a review of some of the current literature in outlier detection for functional data.
Next, two new methods, named ‘Semifast- MUOD’ and ‘Fast-MUOD’, are presented
in Chapter 3. These methods work by computing for each curve three indices (magnitude,
amplitude and shape index) that measure the outlyingness of that curve in terms
of its magnitude, amplitude and shape. ‘Semifast- MUOD’ computes these indices with
respect to (w.r.t.) a random sample of the dataset, while ‘Fast-MUOD’ computes these
indices w.r.t. to the point-wise or L1 median. The classical boxplot is then used as a
cutoff on the three indices to identify curves that are outliers of different types. A byproduct
of the methods is an unsupervised classification of the outliers into different
types, without the need for visualisation. Performance evaluation of the methods, using
various real and simulated datasets, shows that Fast-MUOD is the better of the two new proposed methods for outlier detection, in addition to being very scalable. Comparisons
with latest functional outlier detection methods in the literature also show
superior or comparable outlier detection performance.
In Chapter 4, some theoretical properties of the Fast-MUOD indices are presented.
These include some definitions of the indices, as well as convergence proofs of the sample
approximations. Some properties of the indices under simple transformations are
also presented in this chapter. Finally, three techniques are presented in Chapter 5 for
extending the Fast-MUOD indices to outlier detection in multivariate functional data
observed on the same domain. These techniques include the use of random projections
and identifying outliers on the marginal components of the multivariate functional data.
The use of random projections showed the best result in performance evaluations with
various real and simulated datasets.
Chapter 6 contains some concluding remarks and possible future research work.
Description
Mención Internacional en el título de doctor
Keywords
Outlier detection, Functional data analysis, Semifast-MUOD, Fast-MUOD, R package fdaoutlier