Ir al contenido

Documat


Resumen de Deep learning and uncertainty modeling in visual food analysis

Eduardo Aguilar Torres

  • Several computer vision approaches have been proposed for tackling food analysis problems, due to the challenging problem it poses, the ease collection of food images, and its numerous applications to health and leisure. However, high food ambiguity, inter-class variability and intra-class similarity define a real challenge for the Deep learning and Computer Vision algorithms. With the advent of Convolutional Neural Networks, the complex problem of visual food analysis has experienced significant improvement. Despite this, for real applications, where thousands of foods must be analyzed and recognized, it is necessary to better understand what the model learns and, from this, guide its learning on more discriminatives features to improve its accurate and robustness.

    In this thesis we address the problem of analyzing food images through methods based on deep learning algorithms. There are two distinguishable parts. In the first part, we focus on the food recognition task and delve into uncertainty modeling. First, we propose a new multi-task model that is able to simultaneously predict different food-related tasks. Here, we extend the homoscedastic uncertainty modeling to allow single-label and multi-label classification and propose a regularization term, which jointly weighs the tasks as well as their correlations. Second, we propose a novel prediction scheme based on a class hierarchy that considers local classifiers, in addition to a flat classifier. For this, we define criteria based on the Epistemic Uncertainty estimated from the ’children’ classifiers and the prediction from the ’parent’ classifier to decide the approach to use. And third, we propose three new data augmentation strategies that analysis class-level or sample-level epistemic uncertainty to guide the model training.

    In the second part we contribute to the design of new methods for food detection (food/non-food classification), for ensemble of food classifiers and for semantic food detection. First, we proposes an overview of the last advances on food/non-food classification and an optimal model based on the GoogLeNet architecture, Principal Component Analysis, and a Support Vector Machine. Second, we propose a combination of multiple classifiers for food recognition based on two different Convolutional models that complement each other and thus, achieve an improvement in performance. And third, we address the problem of automatic food tray analysis in canteens and restaurants environment through a new approach that integrates in the same framework food localization, recognition and segmentation for semantic food detection.

    All the methods designed in this thesis are validated and contrasted over relevant public food datasets and the results obtained are reported in detail.


Fundación Dialnet

Mi Documat