Ir al contenido

Documat


Towards deep image understanding: from pixels to semantics

  • Autores: Josep Maria Gonfaus Sitjes Árbol académico
  • Directores de la Tesis: Jordi González Sabaté (dir. tes.) Árbol académico, Theo Gevers (dir. tes.) Árbol académico
  • Lectura: En la Universitat Autònoma de Barcelona ( España ) en 2012
  • Idioma: inglés
  • ISBN: 978-84-940231-5-6
  • Tribunal Calificador de la Tesis: Filiberto Pla Bañón (presid.) Árbol académico, David Masip Rodó (secret.) Árbol académico, Johannes Christianus van Gemert (voc.) Árbol académico
  • Enlaces
    • Tesis en acceso abierto en: TESEO
  • Resumen
    • Understanding the content of the images is one of the greatest challenges of computer vision. Recognition of objects appearing in images, identifying and interpreting their actions are the main purposes of Image Understanding.

      The image reflected on the retina of the eye (human or robotic), or that, by extension, of a video or a camera taking a picture, enables the user to conceptualize its surroundings and, therefore, to interact with it. For example, for an intelligent robot or a smart car to function effectively, it is essential that they recognize their environment to navigate safely. Similarly, in the near future, web browsers will also need to recognize the image contents in order for indexation to take place.

      This thesis seeks to identify what is present in a picture. Our objective is to categorize and locate all objects within an image.

      First of all, to deepen the knowledge on the creation of images, we suggest a method to recognize the physical properties used to produce the image. By combining photometric with geometric information, we learn to distinguish between material edges and scene alterations caused by shadows or light re ections.

      Then, entering the field of semantic recognition of objects, we want to identify the entities appearing in a given image. For this purpose, we focus on two possibilities in order to describe such objects. One possibility consists of assigning to each pixel an object category. This task is commonly known as semantic segmentation. The second approach, which is part of the object detection topic, aims to recognize and localize the whole object by accurately placing a bounding box around it.

      Semantic segmentation focuses on resolving the ambiguity within categories at the pixel-level. This task is essentially done by adding contextual information. We propose three scale levels in order to resolve such ambiguity. At low level, we learn whether the appearance of a pixel resembles the object or not. At middle level, we add information about the object as a whole entity. At top level, we enforce consistency with the rest of the scene, introducing the concept of semantic co-occurrence.

      Finally, regarding object detection, we present two new methods. The first one is focused on improving the object representation at local level with the concept of factorized appearances. An object is represented by several parts. Each of those can then be represented by more than one local appearance. The second method addresses the computational problem of identifying and locating thousands of categories of objects in an image. The main advantage of this method is to create representations of objects which can be reused for other objects, which reduces the computational cost for the other categories.

      The results given have been validated on several commonly used datasets, reaching international recognition and state-of-the-art within the field.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno