Ir al contenido

Documat


Resumen de Simple Meta-optimization of the Feature MFCC for Public Emotional Datasets Classification

Enrique Antonio de la Cal Marín Árbol académico, Alberto Gallucci, José Ramón Villar Flecha Árbol académico, Kaori Yoshida, Mario Koeppen

  • A Speech Emotion Recognition (SER) system can be defined as a collection of methodologies that process and classify speech signals to detect emotions embedded in them [2]. Among the most critical issues to consider in an SER system are: i) definition of the kind of emotions to classify, ii) look for suitable datasets, iii) selection of the proper input features and iv) optimisation of the convenient features. This work will consider four of the well-known dataset in the literature: EmoDB, TESS, SAVEE and RAVDSS. Thus, this study focuses on designing a low-power SER algorithm based on combining one prosodic feature with six spectral features to capture the rhythm and frequency. The proposal compares eleven low-power Classical classification Machine Learning techniques (CML), where the main novelty is optimising the two main parameters of the MFCC spectral feature through the meta-heuristic technique SA: the n mfcc and the hop length.The resulting algorithm could be deployed on low-cost embedded systems with limited computational power like a smart speaker. In addition, the proposed SER algorithm will be validated for four well-known SER datasets. The obtained models for the eleven CML techniques with the optimised MFCC features outperforms clearly (more than a 10%) the baseline models obtained with the not-optimised MFCC for the studied datasets.


Fundación Dialnet

Mi Documat