Carlos Serra Toro
The topic of detecting pedestrians in monocular static images has attracted much interest during the recent years. The problem consists of determining where the humans appear in a scene, if any. Its applications may range from industrial ones (such as automotive safety, street control via CCTV, access control in surveillance systems, etc.) to more domestic-oriented products (such as categorising a personal photo-collection, querying it, making a cheap home-made surveillance system using resources like Raspberry Pi, etc.). Despite the progress done in this field, there are still challenges to face, specially regarding the expressiveness ability of the descriptors to deal with tasks other than (or related with) classification (such as dealing with occlusions or characterisation) and improving the speed of the detection process, among others.
This thesis addresses two of the areas related with the problem of recognising pedestrians in still images: (i) representation issues concerning how an image is encoded to help an algorithm determine whether it contains a pedestrian; and (ii) learning models used to make a computer be able to perform that evaluation. Thus, this thesis focuses on the representation and the classification parts of the detection process.
Concerning the representation issues, a framework is proposed for pedestrian classification inspired on spatial recurrences in the form of recurrence plots, resulting in a new descriptor. This representation is more general and potentially more discriminative than a histogram of co-occurrences, since the correlation information between different spatial locations is maintained rather than just summarised into histograms. This descriptor is tested also on occluded images and for the problem of view recognition, comparing favourably under certain conditions to an state-of-the-art descriptor (CoHOG). To reduce the dimensionality of binary-feature descriptors such as that engineered in this thesis, a simple dimensionality reduction algorithm by compression is proposed and validated against state-of-the-art unsupervised (PCA) and supervised (ReliefF and SOAP) algorithms under several datasets and descriptors, including the proposed one. To finish the representation part, it is presented a work developed in the early stages of this thesis which resulted in one of the first published studies regarding gender characterisation, in which the effect of different rectangular grids configurations is studied depending on the view (frontal, back, or mixed), showing that recognising gender in the frontal view requires the use of coarser grids than those for recognising gender from back views. The proposed descriptor is also applied to the problem of recognising view (frontal vs. back). In addition, this part surveys some of the recent advances concerning the problem of gender characterisation using full-body static images.
Regarding the part of the thesis related with learning models, three unconventional (or emerging) learning paradigms are explored for their application to the problem of pedestrian classification. The Relevance Vector Machine (RVM) is a kind of kernel machine which results in a sparser classifier than Support Vector Machine (SVM), thus resulting in a classifier which evaluates faster. It is studied whether RVM can be a substitute for SVM for the problem of pedestrian classification and, although not a goal in itself, a way of speeding up the (highly costly) training for RVM is also presented. A brief survey of state-of-the-art techniques for accelerating the pedestrian detection pipeline is also presented. The paradigm of Learning Using Privileged Paradigm (LUPI), in which the classifier is fed in the training step with information which is not available at test time (what is called privileged information), with the purpose of improving the generalisation power of the classifier, is also explored; to this end, the SVM+ classifier (the version of SVM implementing the LUPI paradigm) is explored. It has been shown that just randomly generated features may play a key role as privileged information in the LUPI paradigm, thus challenging the concept of privileged information; also, it has been experimentally shown that increasing the training set at the cost of reducing the validation set can make SVMs preferable over its LUPI version (SVM+), thus eliminating the need to obtain or craft privileged information, at least for those problems in which a reasonable amount of validation examples exist. To finish the thesis, a brief incursion is made into the topic of sparse representation for the problem of pedestrian classification by exploring several strategies of dictionary learning. In the thesis it is shown that learning a single dictionary provides similar classification performance over learning two dictionaries, the latter offering just a marginal benefit.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados