Ir al contenido

Documat


Resumen de Scene structure recovery from omnidirectional and depth cameras for assistive computer vision

Alejandro Pérez Yus

  • Computer vision is an impressively expanding field of research that aims to develop algorithms to extract information of the environment using images. The applications of such field are endless, and continuous advances show a bright future ahead. Nowadays computer vision is applied to many areas, such as autonomous driving, drones or robotics. In particular, our motivation lies within the context of human assistance, where computer vision techniques are developed to be of service of people, especially people with disabilities. In this thesis, we particularly focus on the problem of developing a camera-based assistant that allows visually impaired people to move safe and efficiently. Popular low-tech aids for the blind, such as the white cane, work well for simple, short-range situations, but we believe technological advances can enhance the overall experience by providing useful information. Particularly, the advent a few years ago of consumer RGB-D cameras and its current miniaturization possibilities paves the way for a wearable system with such type of camera quite affordable and plausible in the near future. We use RGB-D cameras as a central element of our system, since they provide three-dimensional information alongside color, making them especially useful to detect shapes and obstacles. Additionally, we have also explored their combination with omnidirectional cameras, to create new powerful systems able to capture more information of the environment at once.

    A main goal of this thesis is to develop a variety of methods to recover relevant information of the scene that would help in human navigation. Since this problem is too general, we focused on addressing the extraction of common structural information that can be found in any man-made environment. Particularly, we developed a system able to find the floor and the orientation of the scene, in which obstacles along the way could be detected using depth perception. We extended this navigational system leading to our first major contribution, which is the elaboration of a stair detection, modeling and traversal algorithm. This proposed method is able to detect ascending and descending staircases, obtain their orientation and dimensions, and continuously re-localize the subject during the traversal using a visual odometry algorithm running in parallel.

    Additionally, in order to enable a better understanding of the scene, we propose to estimate the layout of the room, which could help in tasks such as navigation, scene recognition or object detection. However, one of the main limitations of the RGB-D cameras is their narrow field of view. In order to overcome this limitation, our next contribution was to develop and calibrate novel camera systems to extend the field of view by means of omnidirectional cameras. In particular, we developed a fisheye and depth hybrid camera system and the corresponding method of calibration for these type of systems. Moreover, to extend the possibilities to other systems, we developed a second calibration method based on line observations able to calibrate multiple camera combinations, without overlapping field of view requirements and without needing to build a calibration pattern. With our hybrid camera system we open new possibilities such as the design of a layout estimation method, able to obtain full-scaled 3D reconstructions of the scene, benefiting from the wide field of view of the fisheye and the data from the depth camera, which is also a relevant contribution.

    The communication of the perceived information to the user is a complex problem not directly treated in many of the assistive computer vision systems of the literature.

    Given the new advances in prosthetic vision, we investigated the application of the developed computer vision techniques to this area. Patients with visual prosthesis are able to see an array of light dots, called phosphenes. However, nowadays phosphene arrays have limited spatial resolution and dynamic range. Our next major contribution was the challenging task of codifying the information extracted to phosphene patterns, with an iconic representation that makes possible to understand the scene with the limitations given. In particular, we use the depth camera to extract free walking space in the scene, and using an iconic-based approach allows to provide comfortable and informative visual cues that other existing methods usually lack.


Fundación Dialnet

Mi Documat