| アブストラクト | INTRODUCTION: The need for eXplainable Artificial Intelligence (XAI) in healthcare is more critical than ever, especially as regulatory frameworks such as the European Union Artificial Intelligence (EU AI) Act mandate transparency in clinical decision support systems. Post hoc XAI techniques such as Local Interpretable Model-Agnostic Explanations (LIME), SHapley Additive exPlanations (SHAP) and Partial Dependence Plots (PDPs) are widely used to interpret Machine Learning (ML) models for disease risk prediction, particularly in tabular Electronic Health Record (EHR) data. However, their reliability under real-world scenarios is not fully understood. Class imbalance is a common challenge in many real-world datasets, but it is rarely accounted for when evaluating the reliability and consistency of XAI techniques. METHODS: In this study, we design a comparative evaluation framework to assess the impact of class imbalance on the consistency of model explanations generated by LIME, SHAP, and PDPs. Using UK primary care data from the Clinical Practice Research Datalink (CPRD), we train three ML models: XGBoost (XGB), Random Forest (RF), and Multi-layer Perceptron (MLP), to predict lung cancer risk and evaluate how interpretability is affected under class imbalance when compared against a balanced dataset. To our knowledge, this is the first study to evaluate explanation consistency under class imbalance across multiple models and interpretation methods using real-world clinical data. RESULTS: Our main finding is that class imbalance in the training data can significantly affect the reliability and consistency of LIME and SHAP explanations when evaluated against models trained on balanced data. To explain these empirical findings, we also present a theoretical analysis of LIME and SHAP to understand why explanations change under different class distributions. It is also found that PDPs exhibit noticeable variation between models trained on imbalanced and balanced datasets with respect to clinically relevant features for predicting lung cancer risk. DISCUSSION: These findings highlight a critical vulnerability in current XAI techniques, i.e., their interpretability are significantly affected under skewed class distributions, which is common in medical data and emphasises the importance of consistent model explanations for trustworthy ML deployment in healthcare. |
| ジャーナル名 | Frontiers in artificial intelligence |
| Pubmed追加日 | 2025/12/1 |
| 投稿者 | Rai, Teena; He, Jun; Kaur, Jaspreet; Shen, Yuan; Mahmud, Mufti; Brown, David J; O'Dowd, Emma; Baldwin, David |
| 組織名 | Department of Computer Science, Nottingham Trent University, Nottingham, United;Kingdom.;Division of Epidemiology and Public Health, University of Nottingham, Nottingham,;United Kingdom.;Department of Information and Computer Science, King Fahd University of Petroleum;and Minerals, Dhahran, Saudi Arabia. |
| Pubmed リンク | https://www.ncbi.nlm.nih.gov/pubmed/41322476/ |