Low-level vision for resource-limited devices

Xoan Iago Suárez Canosa

Ayuda

Low-level vision for resource-limited devices

Autores: Xoan Iago Suárez Canosa
Directores de la Tesis: Luis Baumela Molina (dir. tes.) , José Miguel Buenaposada Biencinto (dir. tes.)
Lectura: En la Universidad Politécnica de Madrid ( España ) en 2021
Idioma: español
Tribunal Calificador de la Tesis: Carlos Orrite Uruñuela (presid.) , Javier de Lope Asiaín (secret.) , Jose Francisco Velez Serrano (voc.) , Xosé Manuel Pardo López (voc.) , Pablo Márquez Neila (voc.)
Enlaces
- Tesis en acceso abierto en: Archivo Digital UPM
Resumen
- The advent of a panoply of resource-limited devices opens up new challenges in the design of computer vision algorithms with a clear compromise between accuracy and computational requirements. This thesis proposes several low-level algorithms on top of which new applications can be built with better performance for limited devices. We address the problems of local feature detection and description that are the fundamental cornerstone of many computer vision pipelines.
  
  We first propose ELSED, the fastest line segment detector in the literature. The key for its efficiency is a local segment growing algorithm that connects gradient-aligned pixels in presence of small discontinuities. ELSED not only improves the execution time but also the accuracy of the competitors with similar computational requirements.
  
  Next, we introduce FSG, a method to group small segments into full lines that are more suitable for some tasks like vanishing point estimation. It is based on two independent components. A proposer that greedily clusters segments suggesting plausible line candidates and a probabilistic model that decides if a group of segments is an actual line. Unlike its competitors, FSG is able to group segments in real-time achieving state-of-the-art performance.
  
  Last, we study the problem of efficient local feature description where we propose several methods. We introduce an efficient feature description measurement based on the difference of mean gray levels between two square regions and a fast procedure to search for its optimal configuration. In our simplest proposals: BELID, and BEBLID, we select the discriminative measurements by solving a binary classification problem with boosting. Our most elaborated and top-performing descriptors are BAD (Box Average Difference) and HashSIFT. They emerge from the application of triplet ranking loss, hard negative mining, and anchor swapping to features based on pixels differences, such as the one we introduce in this thesis, and image gradients. In our experiments, we evaluate the accuracy, execution time, and energy consumption of the proposed descriptors. We show that their results establish new operating points in the state-of-the-art’s accuracy vs. resources trade-off curve.
  
  The effectiveness of these methods is also supported by their adoption in the industry and the computer vision community. Specifically, as part of the Industrial PhD grant, the code has been integrated as a fundamental component in the pipeline of a visual localization system and the open-source code has been published in the OpenCV library.