Resumen de Un modelo inteligente de interacción natural adaptativo basado en visión artificial

Ayuda

Resumen de Un modelo inteligente de interacción natural adaptativo basado en visión artificial

Juan Jesús Ojeda Castelo

español
En la actualidad existen diversas formas de interacción. Las más extendidas son la interacción mediante teclado y ratón en PC, gamepad en videojuegos y táctil en smartphone y tablet. Sin embargo, la interacción natural gestual sin necesidad de portar o manejar un dispositivo físico ofrecería diversas ventajas a nivel de adaptabilidad, accesibilidad y usabilidad para el usuario. Principalmente la accesibilidad beneficiaría a usuarios con diversidad funcional que, debido a sus limitaciones físicas, los modos más extendidos de interacción tradicional resultarían impracticables en algunos casos.

Esta Tesis doctoral se centra en el desarrollo de un sistema de interacción natural que se caracteriza por ser low-cost, adaptable e inteligente. En el contenido de esta, se puede apreciar tres partes claramente diferenciadas, que explican las etapas en el desarrollo del sistema y los diferentes dispositivos usados en las mismas. En primer lugar, se ha creado un sistema que tiene como dispositivo de interacción natural Microsoft Kinect v1 que permite controlar el movimiento de su cuerpo. Este sistema está compuesto por dos módulos. El primer módulo está orientado principalmente para las habilidades físicas del individuo, mientras que el segundo módulo se centra en las habilidades cognitivas. En esta parte del trabajo se ha colaborado con el Centro de Educación Especial Princesa Sofía de la provincia de Almería, lo que supuso que los propios estudiantes fueran los participantes del estudio y permitió comprobar la validez del sistema. En la evaluación se realizó una encuesta que fue cumplimentada por un conjunto de expertos valorando la usabilidad, modalidad educativa y comportamiento de los estudiantes. Además, se realizaron experimentos con usuarios para medir indicadores como el tiempo y el número de errores en la realización de una actividad. Esto facilitó la obtención de conclusiones acerca del sistema que ayudarán en su mejora.

En segundo lugar, se tiene como objetivo principal la adaptación de la interacción. El sensor utilizado fue Microsoft Kinect v2 debido a la experiencia satisfactoria proporcionada por su anterior versión. La principal aportación fue el diseño de un modelo dispositivo-interacción para poder adaptar la interacción e intentar generalizarla a un mayor número de usuarios. Las actividades propuestas para esta etapa fueron diseñadas con la colaboración de los profesores del Centro de Educación Especial Princesa Sofía. Una de las actividades desarrolladas tenía el fin de que los estudiantes asociaran conceptos respecto a una unidad didáctica. Otra actividad que fue creada tenía el objetivo de trabajar la lateralidad izquierda y derecha. Se realizaron dos tipos de evaluación: Una evaluación con expertos y una evaluación con usuarios finales. En la evaluación con expertos se aplicó el método de inspección con la combinación del recorrido cognitivo y la técnica de pensar en voz alta. En la evaluación con usuarios finales participaron estudiantes con discapacidad física, auditiva, visual y autismo. Esta evaluación consistió en dos iteraciones donde los estudiantes realizaban las actividades y se almacenaban una serie de parámetros para obtener unas conclusiones.

En último lugar, se prescindió del dispositivo Kinect y se decidió hacer un estudio enfocado en la webcam. Esta decisión se debe principalmente a la incertidumbre con el futuro del dispositivo Microsoft Kinect, reducir el coste de adquisición y facilidad de uso. Con esta premisa se ha desarrollado un sistema de reconocimiento de gestos de la mano basado en Deep Learning y Lógica Difusa para determinar los mejores modelos de clasificación. Inicialmente se obtienen los datos que van a ser usados para el posterior entrenamiento con los modelos de Deep Learning. Para este propósito se han obtenido los vídeos de una base de datos de gestos con las manos titulada 20BN-Jester. Posteriormente, se procede a usar transferencia de aprendizaje con modalidad de fine-tuning con una serie de modelos pre-entrenados para que aprendan a clasificar los gestos con las manos. En total se han realizado 104 experimentos donde se han modificado distintos parámetros, entre ellos, el optimizador, número de gestos o la función de coste. A continuación, se han obtenido unas métricas a partir de dichos experimentos que serán las que alimenten al sistema experto difuso. Este sistema experto tiene implementado el sistema Takagi-Sugeno-Kang y está formado por 11 reglas. Estas reglas van a permitir analizar cada una de las distintas configuraciones para obtener un ranking de configuraciones ordenadas de forma descendente de acuerdo con la valoración que proporcione este sistema experto difuso.

El resultado derivado de la investigación realizada en la presente tesis ha propiciado un total de 6 contribuciones científicas, 4 en congresos internacionales con publicaciones en la serie Springer en Advances in Intelligent Systems and Computing, y otras 2 contribuciones en la revista internacional de impacto Multimedia Tools and Applications (Springer, JCR Q2, Computer Science).

Para concluir, la finalización de esta tesis ha dejado abiertas las presentes líneas de investigación: (a) el desarrollo de un sistema de interacción natural que integra Microsoft Kinect Azure como medio de interacción; (b) la creación de un sistema de interacción que sea portable y se pueda acoplar en diversos escenarios; (c) el desarrollo de un sistema que sea autoadaptativo con el objetivo de que adapte los gestos a las características de los usuarios; (d) la elaboración de un sistema híbrido de Inteligencia Artificial para ofrecer un mejor rendimiento en el reconocimiento de gestos y (e) la creación de un sistema multimodal que incorpore diversos modos de interacción, por ejemplo, reconocimiento de gestos y reconocimiento de voz.

Agradecimientos: TIN2017-83964-R, ”Estudio de un enfoque holístico para la interoperabilidad y coexistencia de sistemas dinámicos: Implicación en modelos de Smart Cities”.
English
Currently there are various ways of interaction. The most widespread are the interaction via keyboard and mouse on PC, gamepad in video games and touch on smartphone and tablet. However, natural gestural interaction without the need to wear or handle a physical device would offer various advantages in terms of adaptability, accessibility and usability for the user. Mainly, accessibility would benefit users with special needs who, due to their physical limitations, the more extended modes of traditional interaction would be impractical in some cases.

This dissertation consists of the development of a natural interaction system that is characterized by being low-cost, adaptable, intelligent and portable. The content has been divided into three areas; each one explaining the stages in the development of the system as different devices are used in each stage. The first phase involves creating a system as a natural interaction device by using Microsoft Kinect v1 in which the users are able to control the software by moving their body. This software is composed of a set of activities that are divided into two modules where the first module is mainly oriented towards the physical abilities of the individual while the second module focuses on cognitive abilities. We have collaborated with the Princesa Sofía Special Education Center in the province of Almería where the students were the participants of the study that was carried out to check the validity of the system. In this study, a survey was carried out to evaluate the usability, educational modality and behavior of the students.

In addition, experiments were conducted with users who had to complete the activities where the time and the number of errors were measured to obtain specific information about the system.

For the next phase, the main objective is the adaptation of the interaction. The sensor used was Microsoft Kinect v2 due to the satisfactory experience provided by its previous version. The main focus was for the design of the device-interaction model to be able to adapt to the interaction of a greater number of users through the characteristics of the device. The activities proposed for this stage were again designed with the collaboration of the teachers of the Princesa Sofía Special Education Center. One of the activities developed involved the students associating concepts regarding a didactic unit. Another activity that was created assisted the students to work on their left and right laterality.

Two types of evaluation were conducted at this stage involving the experts and end users.

In the evaluation with experts, the inspection methods applied were the combination of the cognitive walk and the technique of thinking aloud. Students with physical, hearing, and visual disabilities and autism participated in the evaluation with end users. This evaluation consisted of two iterations of activities carried out by the students where a series of parameters during the activities were stored in obtaining conclusions.

In the last phase, the Kinect device that had been used during the work was dispensed with and a study using a webcam was conducted instead. This decision is mainly due to the belief of the uncertainty with the future of the Microsoft Kinect device, reduce cost of ownership and ease of use. With this premise, Deep Learning and Fuzzy Logic have been applied to classify the hand gestures and determine the best configurations among all those that have been tested. In the first place, the data that will be used for the subsequent training with the Deep Learning models is obtained. For this purpose, the videos have been obtained from a database of hand gestures entitled 20BN-Jester. Once the data have been collected, we proceeded to use learning transfer with a fine-tuning modality with a series of pre-trained models so that the system learns to classify hand gestures. In total, 104 experiments have been carried out where different parameters have been modified, including the optimizer, number of gestures and the cost function.

Some metrics have been obtained from the said experiments that will be the ones that feed the fuzzy expert system. This expert system has the Takagi-Sugeno-Kang system implemented and is made up of 11 rules. These rules will allow the analysis of each of the different configurations and thus obtain a classification of said configurations listed in descending order according to the assessment provided by this fuzzy expert system.

As a result derived from the research carried out in this thesis, a total of 6 scientific contributions have been obtained; 4 in international conferences with publications in the Springer series in Advances in Intelligent Systems and Computing, and another 2 contributions in the international impact journal Multimedia Tools and Applications (Springer, JCR Q2, Computer Science). In conclusion, the completion of this Thesis has left these lines of research open: (a) the development of a natural interaction system that integrates Microsoft Kinect Azure as a means of interaction; (b) the creation of an interaction system that is portable and can be used in various settings; (c) the development of a system that is self-adaptive in order to adapt to the gestures of the characteristics of the users; (d) the development of a hybrid artificial intelligence system to offer an improved performance in gesture recognition and (e) the creation of a multimodal system that incorporates various modes of interaction, for example, gesture recognition and voice recognition.

Acknowledgments: TIN2017-83964-R, Study of a holistic approach for the interoperability and co-existence of dynamic systems: Implication in Smart Cities models.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Mi Documat

Selección

Coordinado por: