Disagreement amongst counterfactual explanations: how transparency can be misleading

Dieter Brughmans; Lissa Melis; David Martens

Ayuda

Disagreement amongst counterfactual explanations: how transparency can be misleading

Dieter Brughmans ^[1] ; Lissa Melis ^[2] ; David Martens ^[1]
1. [1] University of Antwerp
  
  University of Antwerp
  
  Arrondissement Antwerpen, Bélgica
2. [2] Civil and Environmental Engineering Department, Pennsylvania State University, 212 Sackett Building, University Park, PA, 16802, USA School of Business and Economics, Maastricht University, Tongersestraat 53, Maastricht, 6211 LM, The Netherlands
Localización: Top, ISSN-e 1863-8279, ISSN 1134-5764, Vol. 32, Nº. Extra 3, 2024 (Ejemplar dedicado a: Mathematical Optimization and Machine Learning), págs. 429-462
Idioma: inglés
DOI: 10.1007/s11750-024-00670-2
Enlaces
- Texto completo
Resumen
- Counterfactual explanations are increasingly used as an Explainable Artifcial Intelligence (XAI) technique to provide stakeholders of complex machine learning algorithms with explanations for data-driven decisions. The popularity of counterfactual explanations resulted in a boom in the algorithms generating them. However, not every algorithm creates uniform explanations for the same instance. Even though in some contexts multiple possible explanations are benefcial, there are circumstances where diversity amongst counterfactual explanations results in a potential disagreement problem among stakeholders. Ethical issues arise when for example, malicious agents use this diversity to fairwash an unfair machine learning model by hiding sensitive features. As legislators worldwide tend to start including the right to explanations for data-driven, high-stakes decisions in their policies, these ethical issues should be understood and addressed. Our literature review on the disagreement problem in XAI reveals that this problem has never been empirically assessed for counterfactual explanations. Therefore, in this work, we conduct a large-scale empirical analysis, on 40 data sets, using 12 explanation-generating methods, for two black-box models, yielding over 192,000 explanations. Our study fnds alarmingly high disagreement levels between the methods tested. A malicious user is able to both exclude and include desired features when multiple counterfactual explanations are available. This disagreement seems to be driven mainly by the data set characteristics and the type of counterfactual algorithm. XAI centers on the transparency of algorithmic decision-making, but our analysis advocates for transparency about this self-proclaimed transparency.
Referencias bibliográficas
- Arrieta AB, Díaz-Rodríguez N, Del Ser J et al (2020) Explainable artifcial intelligence (xai): Concepts, taxonomies, opportunities and challenges...
- Aïvodji U, Arai H, Fortineau O, et al (2019) Fairwashing: the risk of rationalization. International Conference on Machine Learning pp 161–170
- Bordt S, Finck M, Raidl E, et al (2022) Post-hoc explanations fail to achieve their purpose in adversarial contexts. In: 2022 ACM Conference...
- Brughmans D, Leyman P, Martens D (2023) Nice: an algorithm for nearest instance counterfactual explanations. Data Mining and Knowledge Discovery...
- Carrizosa E, Ramírez-Ayerbe J, Morales DR (2024) Generating collective counterfactual explanations in score-based classifcation via mathematical...
- Crupi R, Castelnovo A, Regoli D, et al (2022) Counterfactual explanations as interventions in latent space. Data Mining and Knowledge Discovery...
- Dandl S, Molnar C, Binder M, et al (2020) Multi-objective counterfactual explanations. In: International Conference on Parallel Problem Solving...
- Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608
- Dua D, Graf C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
- Dwivedi R, Dave D, Naik H et al (2023) Explainable AI (XAI): Core ideas, techniques, and solutions. ACM Comput Surv 55(9):1–33
- Fernández RR, de Diego IM, Moguerza JM et al (2022) Explanation sets: A general framework for machine learning explainability. Inf Sci 617:464–481
- Fernández-Loría C, Provost F, Han X (2020) Explaining data-driven decisions made by AI systems: The counterfactual approach. arXiv preprint...
- Goethals S, Martens D, Evgeniou T (2023) Manipulation risks in explainable AI: The implications of the disagreement problem. arXiv preprint...
- Goodman B, Flaxman S (2017) European Union regulations on algorithmic decision-making and a “right to explanation.” AI magazine 38(3):50–57
- Guidotti R (2022) Counterfactual explanations and how to find them: Literature review and benchmarking. Data Mining and Knowledge Discovery...
- Han T, Srinivas S, Lakkaraju H (2022) Which explanation should I choose? A function approximation perspective to characterizing post hoc explanations....
- Hasan MGMM, Talbert D (2022) Mitigating the Rashomon effect in counterfactual explanation: A game-theoretic approach. In: The International...
- Hinns J, Fan X, Liu S, et al (2021) An initial study of machine learning underspecification using feature attribution explainable AI algorithms:...
- Huysmans J, Baesens B, Vanthienen J (2006) Using rule extraction to improve the comprehensibility of predictive models
- Karimi AH, Barthe G, Balle B, et al (2020) Model-agnostic counterfactual explanations for consequential decisions. In: International Conference...
- Keane MT, Smyth B (2020) Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable...
- Krishna S, Han T, Gu A, et al (2022) The disagreement problem in explainable machine learning: A practitioner’s perspective. arXiv preprint...
- Lakkaraju H, Bastani O (2020) "How do I fool you?" Manipulating user trust via misleading black box explanations. In: Proceedings...
- Laugel T, Lesot MJ, Marsala C, et al (2018) Comparison-based inverse classification for interpretability in machine learning. In: International...
- Linardatos P, Papastefanopoulos V, Kotsiantis S (2020) Explainable AI: A review of machine learning interpretability methods. Entropy 23(1):18
- Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 30
- Martens D (2022) Data Science Ethics: Concepts, Techniques, and Cautionary Tales. Oxford University Press
- Martens D, Provost F (2014) Explaining data-driven document classifications. MIS Quarterly 38(1):73–100. https://www.jstor.org/stable/26554869
- Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev 63(2):81
- Miller T (2019) Explanation in artificial intelligence: Insights from the social sciences. Artif Intell 267:1–38
- Molnar C (2018) A guide for making black box models explainable. URL: https://christophm.github.io/interpretable-ml-book
- Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of...
- Neely M, Schouten SF, Bleeker MJ, et al (2021) Order in the court: Explainable AI methods prone to disagreement. arXiv preprint arXiv:2105.03287
- Păvăloaia VD, Necula SC (2023) Artificial intelligence as a disruptive technology—a systematic literature review. Electronics 12(5):1102
- Pawelczyk M, Broelemann K, Kasneci G (2020) On counterfactual explanations under predictive multiplicity. In: Conference on Uncertainty in...
- Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd...
- Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial...
- Rosenfeld A (2021) Better metrics for evaluating explainable artificial intelligence. In: Proceedings of the 20th International Conference...
- Roy S, Laberge G, Roy B, et al (2022) Why don’t XAI techniques agree? Characterizing the disagreements between post-hoc explanations of defect...
- Schleich M, Geng Z, Zhang Y et al (2021) GeCo: Quality counterfactual explanations in real time. Proceedings of the VLDB Endowment 14(9):1681–1693
- Schwarzschild A, Cembalest M, Rao K, et al (2023) Reckoning with the disagreement problem: Explanation consensus as a training objective....
- Slack D, Hilgard S, Jia E, et al (2020) Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In: Proceedings of the...
- Van Looveren A, Klaise J (2021) Interpretable counterfactual explanations guided by prototypes. In: Joint European Conference on Machine Learning...
- Verma S, Boonsanong V, Hoang M, et al (2020) Counterfactual explanations and algorithmic recourses for machine learning: A review. arXiv preprint...
- Vilone G, Longo L (2021) Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion 76:89–106
- Wexler J, Pushkarna M, Bolukbasi T, et al (2019) The What-If Tool: Interactive probing of machine learning models. IEEE Trans Visual Comput...