Biopotential signals and their applicability to cybersecurity problems

Caterina Fuster Barceló

Ayuda

Biopotential signals and their applicability to cybersecurity problems

Autores: Caterina Fuster Barceló
Directores de la Tesis: María Carmen Cámara Núñez (dir. tes.) , Pedro Peris López (codir. tes.)
Lectura: En la Universidad Carlos III de Madrid ( España ) en 2022
Idioma: inglés
Tribunal Calificador de la Tesis: Luis Hernández Encinas (presid.) , Ana Isabel González-Tablas Ferreres (secret.) , Joaquin Arias Herrero (voc.)
Enlaces
- Tesis en acceso abierto en: e-Archivo
Resumen
- español
  Los sistemas biométricos son una técnica de identificación en auge en la actualidad. En los últimos años se han utilizado muchos sistemas diferentes en la vida cotidiana, como la huella dactilar, el escáner facial, o el ECG, entre otros. De hecho, son más de 20 años los que avalan que el Elektrokardiogramm (EKG) o el Electrocardiogram (ECG) es un método fiable para realizar identificación de usuarios. En esta tesis se propone un nuevo método de identificación biométrica denominado ELEKTRA. Por otro lado, existen algunos inconvenientes en el estado del arte actual respecto a la identificación con EKG. Muchos sistemas utilizan arquitecturas muy complejas de Deep Learning o extraen las características importantes mediante un análisis fiduciario, haciendo que el sistema biométrico sea demasiado complejo o costoso. Un fallo importante, no solo en los sistemas biométricos, es la falta de bases de datos públicas y el uso de bases de datos privadas para la investigación. El uso de bases de datos privadas en cualquier estudio hace que los experimentos y los resultados sean irreproducibles y son un inconveniente en cualquier campo de la ciencia. En esta tesis doctoral se ha desarrollado ELEKTRA, un sistema de identificación biométrica, mediante el uso de imagénes llamadas Elektrokardiomatrix (EKM). Estas imágenes se construyen a partir de realizar un mapa de calor de un conjunto de picos R (latidos) alineados, formando una matriz. Con el fin de ofrecer resultados reproducibles, se usan cuatro diferentes bases de datos públicas para demostrar la viabilidad y adaptabilidad del modelo: la Normal Sinus Rhythm Database (NSRDB), la MIT-BIH Arrhythmia Database (MIT-BIHDB), la Physikalisch-Technische Bundesanstalt (PTBDB) y la Glasgow University Database (GUDB). Se han creado nuevas sub-bases de datos de EKMs a partir de cada una de las bases de datos mencionadas. Además, para testear la adaptabilidad y viabilidad de ELEKTRA como sistema biométrico se construye una CNN sencilla, pero eficaz, con una sola capa Convolucional. Las cuatro bases de datos anteriormente mencionadas se han testeado en los Capítulos 3, 4 y 5. En el Capítulo 3 se estudia la NSRDB como prueba de concepto de identificación en usuarios control. Se realizan diferentes experimentos con el objetivo de estudiar el comportamiento de ELEKTRA. Las características estudiadas con esta base de datos son: cuántos latidos son necesarios para identificar a un usuario; los costes de convergencia del modelo presentado; la clasificación de un usuario jamás visto proveniente de una base de datos diferente; la clasificación de un usuario cuya señal EKG ha sido modificada añadiendo ruido Gaussiano; y la viabilidad de ELEKTRA probando cuántas imágenes o EKM son suficientes para identificar a un usuario. En cuanto a las bases de datos que contienen usuarios con CVD, la MITBIHDB contiene pacientes con Arritmia y usuarios sanos, y la PTBDB contiene pacientes con diferentes CVD junto a usuarios sanos. Estas dos bases de datos se estudian en el Capítulo 4, donde se estudia la adaptabilidad de ELEKTRA a distintas CVDs. En primer lugar, se testea la MIT-BIHDB logrando resultados prometedores y mostrando cómo ELEKTRA es capaz de identificar usuarios con y sin arritmia en el mismo grupo. En segundo lugar, se toma la PTBDB completa obteniendo porcentajes altos de acierto y bajos en cuanto a tasas de error concierne. Y por último, se prueba ELEKTRA sobre algunos usuarios con CVD específicos de la PTBDB para ver su comportamiento cuando sólo se incluyen usuarios con CVD. El resultado de estos experimentos muestra cómo ELEKTRA es capaz de identificar a los usuarios con y sin CVD acercándose a un escenario real. Por último, en el capítulo 5 se prueba ELEKTRA sobre la GUDB para evaluar el rendimiento de la identificación de usuarios cuando éstos realizan diferentes actividades cardiovasculares. La GUDB consta de 25 usuarios que realizan cinco actividades diferentes con distintos niveles de esfuerzo cardiovascular (sentarse, caminar, hacer un examen de matemáticas, usar una bicicleta de mano y correr en una cinta). El sistema biométrico propuesto se prueba con cada una de estas actividades para mostrar que es más complejo identificar a los usuarios cuando realizan una actividad que requiere un mayor esfuerzo cardiovascular y, en consecuencia, tienen una mayor frecuencia cardíaca. Los experimentos realizados consisten en fusionar diferentes actividades para estudiar las diferencias entre las frecuencias cardíacas y cómo la identificación del usuario está relacionada la misma. El experimento más representativo se realiza entrenando el modelo con el escenario en el que el usuario está sentado y realizando la clasificación ciega de usuarios del escenario en el cual están corriendo. En este experimento, se obtiene una precisión realmente baja demostrando que para frecuencias de latidos más altas es más complejo identificar a un usuario. De hecho, una de las principales ventajas del modelo presentado es que, incluso con una precisión baja, la Tasa de Falsa Aceptación no ha aumentado en comparación con los otros experimentos, lo que significa que un impostor no podría conseguir eludir el sistema. Sin embargo, si la base de datos se lanza sobre todas las actividades fusionadas, se muestran resultados precisos que ofrecen un modelo inclusivo para entrenar y probar sobre usuarios que realizan diferentes actividades. De este modo, ELEKTRA contribuye al estado del arte proporcionando un nuevo método de identificación de usuarios con EKGs con muchas ventajas. Los excelentes resultados en términos de alta precisión y bajas tasas de error en los experimentos, aseguran la eficiencia de ELEKTRA. El hecho de que las bases de datos utilizadas para realizar la experimentación en esta tesis doctoral estén disponibles públicamente, hace que este trabajo sea reproducible. De hecho, como las bases de datos utilizadas son diferentes en función de los usuarios que conforman cada una, se establece que el método de identificación propuesto es inclusivo ya que todos los seres vivos tienen su propio EKG. También, se obtienen altas precisiones al probar el modelo sobre usuarios con diferentes CVD. Además, gracias a la GUDB, ELEKTRA determina que identificar usuarios en base a sus EKGs mientras hacen actividades cardiovasculares, que requieren un mayor esfuerzo, es más complejo. En conclusión, por los estudios realizados en esta tesis doctoral, se puede asumir que ELEKTRA es un método de identificación factible y eficiente para la biometría con EKG.
- English
  Biometric systems are an uprising technique of identification in today's world. Many different biometric systems have been used in everyone's daily life in the past years, such as fingerprint, face scan, ECG, and others. More than 20 years evince that the Elektrokardiogram (EKG) or Electrocardiogram (ECG) is a feasible method to perform user identification as each person has their unique and inherent EKG.
  
  A biometric system is based on something that every human being is and cannot lose or possess as it is an eye, the DNA, palm print, vein patterns, iris, retina, etc. For this reason, during the last decade, biometric identification or authentication has gained ground between the classic authentication systems as it was a PIN or a physical key. All biometric systems, to be accepted, must fulfill a set of requirements including universality, uniqueness, permanence, and collectability. The EKG is a biometric trait that not only fulfills those requirements but also has some advantages over other biometric traits. To use an EKG as the biometric trait for identification is motivated by four key points: 1) the collection of an EKG is a non-invasive technique so may contribute to the acceptability among the population; 2) a human being can only be identified if they are alive as their heart must be beating; 3) all living beings have their EKG so it is inclusive; 4) an EKG not only provides identification but also provides a medical and even emotional diagnose.
  
  There exist many works regarding user identification with EKGs in the current state-of-the-art. Biometric identification with EKGs has been deployed using many different techniques. Some works use the fiducial points of the EKG signal (T-peak, R-peak, P-onset, QRS-offset, ...) to perform the user identification and others use feature extraction performed by a Neural Network as the classification or identification method. As the EKG is a signal which is expressed in time and frequency, many different Neural Network models can exploit the dissimilarity between each EKG signal from each user to perform user identification such as Recurrent Neural Networks, Convolutional Neural Networks, Long-Short Term Memory, Principal Component Analysis, among others offering very competitive results.
  
  Focusing on user identification, depending on the user condition in each case, as has been commented before, the EKG not only contributes as an identification method but also offers a diagnosis as it is a person's condition from a medical point of view or a person's status regarding their emotional state. Some research has studied certain conditions such as anxiety over EKG identification showing that higher heart rates might be more complex to identify individuals.
  
  Nevertheless, there are some drawbacks in the current state-of-the-art regarding identification with EKG. Many systems use very complexly Deep Learning architectures or, as commented, extract the features by a fiducial analysis making the biometric system too complex and computationally costly. One important flaw, not only in biometric systems but in science, is the lack of publicly available datasets and the use of private ones to perform different studies. Using a private database for any research makes the experiments and results irreproducible and it could be considered a drawback in any science field. Furthermore, many of these works use the EKG signal in a sense that it can be recovered from the identification system so there is no privacy protection for the user as anyone could retrieve their EKG signal.
  
  Owing to the many drawbacks of a biometric system based on ECG signals, ELEKTRA is presented in this thesis as a new identification system whose aim is to overcome all the inconveniences of the current proposals. ELEKTRA is a biometric system that performs user identification by using EKGs converted into a heatmap of a set of aligned R-peaks (heartbeats), forming a matrix called an Elektrokardioamtrix (EKM).
  
  ELEKTRA is based on past work where the EKM was already created for medical purposes. As far as the literature covers up to this date, all the existing research regarding the use of the EKM is focused on the diagnosis of different CVD such as Congestive Heart Failure, Atrial Fibrillation, and Heart Rate Variability, among others. Therefore, the work presented in this thesis, presumably, is the first one to use the EKM as a valid identification method.
  
  In aim to offer reproducible results, four different public databases are taken to show the model feasibility and adaptability: i) the Normal Sinus Rhythm Database (NSRDB), ii) the MIT-BIH Arrhythmia Database (MIT-BIHDB), iii) the Physikalisch-Technische Bundesanstalt Database (PTBDB), and iv) the Glasgow University Database. The first three of them (i, ii and iii are taken from Physionet a freely-available repository with medical research data, managed by the MIT Laboratory. However, the fourth database (iv) is also freely available by petition to Glasgow University.
  
  Furthermore, to test ELEKTRA's adaptability and feasibility of the biometric system presented, four different datasets are built from the databases where the EKG signals are segmented into windows to create several EKMs. The number of EKMs built for each dataset will depend on the length of the records. For example, for the NSRDB as the EKG records are very extensive, 3000 EKMs or images per user will be obtained. However, for the three other databases, the highest possible number of EKM images is obtained until the signal is lost. It is important to take into account that depending on the number of heartbeats taken to be represented in each EKM, a different number of EKMs is obtained for the three databases in which EKG recordings are shorter. As higher the number of heartbeats o R-peaks taken (i.e., 7bpf), the fewer images will be obtained.
  
  Once the datasets of EKMs are constructed, a simple yet effective Convolutional Neural Network (cnn) is built by one 2D Convolution with ReLU activation, a max-pooling operation followed by a dropout to include regularisation and, and finally, a layer with flattened and dense operations with a softmax or sigmoid function depending if the classification task is categorical o binary to achieve the final classification. With this simple CNN, the feasibility and adaptability of ELEKTRA are demonstrated during all the experiments.
  
  The four databases are tested during chapters 3, 4, and 5 where the experimentation takes place. In Chapter 3, the NSRDB is studied as the baseline of identification with control users. Different experiments are conducted with aim of studying ELEKTRA's behaviour. In the first experiments, how many heartbeats are needed to identify a user and the costs of convergence of the model depending on the time computing and the number of heartbeats taken to be represented in the EKM are studied. In this case, similar results are achieved in all the experiments as results close to 100% of accuracy are obtained. In the classification of a non-seen user a user, from a different database that has not been seen in any other experiment, is processed and tested against the network. The result obtained is that a non-seen user or an impersonator would only bypass the system one in ten times which can be considered a low ratio when many systems are blocked after three to five attempts. The classification of a user is tested to have a closer situation in which a low-cost sensor is used. For this experiment, an EKG signal is modified by adding Gaussian noise and then processed as any other signal. As a demonstration of our robust system, an accuracy of 99% is obtained indicating that a noisy signal can be processed too. The last experiment over the NSRDB is where this database is used to test the feasibility of ELEKTRA by testing how many images or EKM are enough to identify a user. Even though there is a decrease in accuracy when the number of images used to train the network is decreased too, a 97% of accuracy is obtained when training the network with only 300 EKMs per user. This chapter concludes that, as shown in all the experiments, ELEKTRA is a valid and feasible identification method for control users.
  
  The MIT-BIHDB is a database comprising patients with Arrhythmia and random users, and the PTBDB comprises patients with different Cardiovascular Disease (CVD) together with healthy users. Hence, the main goal in Chapter 4 is to study the identification system proposed over users with CVD showing ELEKTRA's adaptability. First of all, the MIT-BIHDB is tested achieving outperforming results and showing how ELEKTRA is capable to identify a pool of users with and without arrhythmia with just a slight decrease of the network's accuracy as a 97% of accuracy is obtained. Secondly, the whole PTBDB is taken to test the biometric system. The result obtained in this experiment is lower than in the other ones (a 93% of accuracy) as the number of images used to train the network has suffered a great decrease compared to the other experiments and 232 users are being studied. Lastly, ELEKTRA has tested over 162 users from the PTBDB with specific CVD which, namely, are Bundle branch block, Cardiomyopathy, Dysrhythmia, Myocardial infarction, Myocarditis, and Valvular heart disease. Through this experiment, the aim is to see ELEKTRA's behaviour when only users with CVD are included. Better results are obtained compared to the last experiment. It can be owed that the number of users has decreased and that a CVD makes more unique each EKG as many researchers use the EKM for diagnosis purposes. The conclusion extracted from all the experiments from this chapter is that ELEKTRA is capable to identify users with and without CVD approaching a real-life scenario.
  
  In Chapter 5 the GUDB is tested to evaluate the performance of user identification when the users are performing different activities. The GUDB comprises 25 users performing five different activities with different levels of cardiovascular effort: sitting, walking on a treadmill, doing a maths exam, using a handbike, and running on a treadmill. The proposed biometric system is tested with each of these activities for 3 and 5 bpf achieving different results in each case. For the experiments performed where an activity requiring lower cardiovascular effort such as sitting or walking, the accuracy obtained is close to 100% as it is 99.19% for sitting and 98.59% for walking. Then for the scenarios where higher heartbeat rates are supposed the experiment results in lower accuracies as it is jogging with an 82.63% and biking with a 95.51%. For the maths scenario, its outcome is different; the heartbeat rate for each user could be different depending on how nervous each user is. Hence, a 94.0% is obtained with this activity. The conclusion extracted from these first experiments is that it is more complex to identify users when they are performing an activity that requires a higher cardiovascular effort and, consequently have a higher heart rate. For the following experiment, all scenarios have been merged to study the behaviour of a system that has been trained with users performing different activities. In this case, the results obtained seemed to be close to the mean of the results obtained before as the general accuracy for all the scenarios with 3bpf is 91.32%. For the subsequent experiments, some of the scenarios have been merged into two different categories. On the one hand, the more calmed activities (sitting and walking) have been merged in the so-called Low Cardiovascular Activity (LCA) scenario. The accuracy obtained by training and testing with these two activities together is 97.74% and an EER of 1.01%. On the other hand, the High Cardiovascular Activity (HCA) scenario is composed by activities that require a higher cardiovascular effort (jogging and biking). In this case, the results obtained have decreased compared to the last ones as the accuracy is 85.71%. It can be noticed that what has suffered a considerable increase is the False Rejection Rate (FRR) which is 14.17% without implying an increase in the False Acceptance Rate (FAR) which is still very low as it is 0.6%. The last experiments have been called fight of scenarios as there is a confrontation between scenarios by merging some of them and training with some activities or scenarios and predicting with different ones. The first experiments that can be found in this section are training with the LCA group and testing with the HCA group and vice versa. The results here show a great decrease in the performance as accuracies are 37.24% and 46.42%, respectively. This fact implies that it is more complex to identify users that have been registered with a different heartbeat rate. Last but not least, there are a set of experiments where the activities have been confronted such as training the network with the sitting scenario and testing with the jogging scenario. These experiments confirm the hypothesis for higher heart rates, are more complex to identify users, and even more when the network has been trained over calmed users. Even though, one of the main advantages of the presented model is that, even for low accuracies, the False Acceptance Rate has not increased compared to the other experiments meaning that an impostor could not achieve bypassing the system.
  
  Lastly, in Chapter 6 conclusions and discussions are offered. A comparison between ELEKTRA and other biometric systems based on EKGs from the current state-of-the-art is offered. These researches from the literature are examined to show how ELEKTRA outperforms all of them in regards to some of the aspects such as efficiency, complexity, accuracy, error rates, and reproducibility among others. It is important to remark that, compared to the other works, in all experiments performed in this doctoral thesis, high performances with high accuracies and low error rates are achieved. In fact, what is remarkable is that this performance is obtained using a very simple CNN conformed by just one convolutional layer. By achieving outstanding results with a simple neural network, the solidity of ELEKTRA is proven.
  
  By this, ELEKTRA contributes to the state-of-the-art by providing a new method for user identification with EKGs with many benefits. Outstanding results in terms of high accuracy and low error rates in the experiments assure the efficiency of ELEKTRA. The fact that the databases used to perform the experimentation in this doctoral thesis are publicly available, makes this work reproducible in contrast to many other works in the literature. In fact, as the databases used are different depending on the users' nature conforming to each database, it is established that the identification method proposed is inclusive as all living beings have their own EKG and high accuracies are also obtained when testing the model over users with different CVD. Moreover, as it has been proven that users with CVD can also be identified without having major drawbacks, ELEKTRA offers an identification system that can also offer a diagnosis of the user who is being identified in terms of their medical health. In addition, thanks to the GUDB, ELEKTRA can determine for the first time, as far as the literature reaches, that performing user identification with EKGs over users performing activities requiring a higher cardiovascular effort and consequently having higher heartbeat rates, is more complex.
  
  In conclusion, by the studies and experiments performed in this doctoral thesis, it can be assumed that ELEKTRA is a feasible and efficient identification method for biometrics with EKG and outperforms the current state-of-the-art proposals in user identification with EKG.