Augmented Reality (AR) and its ability to integrate synthetic content over a real image provides invaluable value in various fields; however, the industry is one of these fields that can benefit most from it. As a key technology in the evolution towards Industry 4.0 and 5.0, AR not only complements but also enhances human interaction with industrial processes. In this context, AR becomes an essential tool that does not replace the human factor but enriches it, expanding its capabilities and facilitating more effective collaboration between humans and technology. This integration of AR in industrial environments not only improves the efficiency and precision of tasks but also opens new possibilities for expanding human potential.
There are numerous ways in which humans interact with technology, with AR being one of the most innovative paradigms in how users access information; however, it is crucial to recognize that AR, by itself, has limitations in terms of interpreting the content it visualizes. Although today we can access different libraries that use algorithms for image, object, or even environment detection, a fundamental question arises: To what extent can AR understand the context of what it sees? This question becomes especially relevant in industrial environments. Can AR discern if a machine functions correctly, or is its role limited to presenting superimposed digital indicators? The answer to these questions underscores both the potential and the limits of AR, driving the search for innovations that allow for greater contextual understanding and adaptability to specific situations within the industry.
At the core of this thesis lies the objective of not only endowing AR with "semantic intelligence" capable of interpreting and adapting to context, but also of expanding and enriching the ways users interact with this technology. This approach mainly aims to improve the accessibility and efficiency of AR applications in industrial environments, which are by nature restricted and complex. The intention is to go beyond the traditional limits of AR, providing more intuitive and adaptive tools for operators in these environments.
The research unfolds through three articles, where a progressive multimodal architecture has been developed and evaluated. This architecture integrates various user-technology interaction modalities, such as voice control, direct manipulation, and visual feedback in AR. In addition, advanced technologies based on Machine Learning (ML) and Deep Learning (DL) models are incorporated to extract and process semantic information from the environment. Each article builds upon the previous one, demonstrating an evolution in AR's ability to interact more intelligently and contextually with its environment, and highlighting the practical application and benefits of these innovations in the industry.
© 2008-2025 Fundación Dialnet · Todos los derechos reservados