Human body parts segmentation via stacked and multi-task learning

Daniel Sanchez Abril

Ayuda

Human body parts segmentation via stacked and multi-task learning

Autores: Daniel Sanchez Abril
Directores de la Tesis: Xavier Baró Solé (dir. tes.) , Sergio Escalera Guerrero (codir. tes.)
Lectura: En la Universitat Oberta de Catalunya ( España ) en 2019
Idioma: español
Tribunal Calificador de la Tesis: Jordi González Sabaté (presid.) , David Masip (secret.) , Santiago Seguí Mesquida (voc.)
Enlaces
- Tesis en acceso abierto en: O2 Repositori UOC
Resumen
- Human Body Segmentation in RGB images has been a core problem on the Computer Vision field since its early beginnings. In this particular problem, the goal is to provide with a complete segmentation of the human/s body parts appearing in an image, discrim- inating the human body from the rest of the image. It is a very challenging area since it has to face many handicaps related to high variability in data such as lighting conditions, cluttering, clothes, appearance, background, point of view and number of human body parts, among others. Even so, it has become one of the areas of research because of its ca- pabilities in real applications (i.e.surveillance, medical imaging, sign language, interactive virtual reality systems).
  
  Hand-crafted methods covered traditional methods such as simple matching templates, deformable models, pictorial structures with tree and loopy models and discriminative en- sembles learning. These approaches took researchers to point out rigorous studies to con- straint the problem either by kinematic structure reasons or variability in poses/samples. However, with the appearance of deep-based methods, the traditional pipelines and meth- ods have changed to use Deep Convolutional Neural Networks in its different variations merely. As a result, deep-based methods have been surpassing by a large margin the hand-crafted methods getting the researchers to focus on the latter methods and in their combination with traditional ones.
  
  The writing of this thesis coincides with the paradigm shift; therefore, it is evidenced into two distinctive blocks. In the first block, we focus on a novel dataset in order to extend the state-of-the-art in human pose estimation and body segmentation. Next, we present a novel two-stage approach for human body part segmentation. We propose to use a cascade of classifiers as body parts detectors combining their outputs in an Error- Correcting Output Codes framework. Once we obtain the body pose, we apply Graph Cut segmentation optimization. Then, we use HOG features to describe the dataset and train SVM classifiers combined with the ECOC framework to feed a body part segmentation Graph Cut approach.
  
  Moreover, we face full body segmentation, but differently, we present a novel two- stage human body segmentation method based on the discriminative Multi-Scale Stacked Sequential Learning (MSSL) framework. In the first stage of our method for human segmentation, a multi-class Error-Correcting Output Codes classifier (ECOC) is trained to detect body parts and to produce a soft likelihood map for each body part. In the second stage, multi-scale decomposition of these maps and a neighborhood sampling is performed, resulting in a new set of features. This extensive set is trained in a stacked learning fashion with a Random Forest binary classifier. Finally, in order to obtain the resulting binary human segmentation, a post-processing step is performed through Graph Cuts optimization, which is applied to the output of the binary classifier.
  
  In the second block of the thesis, we analyze four related human analysis tasks in still images in a multi-task scenario by leveraging synthetic datasets. Specifically, we study the correlation of 2D/3D pose estimation, body part segmentation, and full-body depth estimation. The main goal is to analyze how training together these four related tasks can benefit each task for a better generalization. Results show that all four tasks benefit from the multi-task approach, but with different combinations of tasks.
  
  In conclusion, this thesis shows the benefit of stacked and multi-task learning for the task of human body part segmentation in still images.