Resumen de A Hybrid Post Hoc Interpretability Approach for Deep Neural Networks

Flávio Arthur Oliveira Santos, Cleber Zanchettin, José Vitor Santos Silva, Leonardo Nogueira Matos, Paulo Novais

Every day researchers publish works with state-of-the-art results using deep learning models, however as these models become common even in production, ensuring fairness is a main concern of the deep learning models. One way to analyze the model fairness is based on the model interpretability, obtaining the essential features to the model decision. There are many interpretability methods to produce the deep learning model interpretation, such as Saliency, GradCam, Integrated Gradients, Layer-wise relevance propagation, and others. Although those methods make the feature importance map, different methods have different interpretations, and their evaluation relies on qualitative analysis. In this work, we propose the Iterative post hoc attribution approach, which consists of seeing the interpretability problem as an optimization view guided by two objective definitions of what our solution considers important. We solve the optimization problem with a hybrid approach considering the optimization algorithm and the deep neural network model. The obtained results show that our approach can select the features essential to the model prediction more accurately than the traditional interpretability methods.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: