Novel interpretable meta-learning approach for Alzheimer's disease diagnosis and fixed-horizon progression prediction using variable-length longitudinal sequences
Abstract
The early diagnosis and successful treatment of Alzheimer's Disease (AD) depend on the accurate prediction of its stages. Men 65 years old in the US have a 19.5% lifetime risk of Alzheimer's disease, while women have a 21.1% risk. In 2020, over 6.1 million persons aged 65 years and older in the United States suffered from Alzheimer's disease, with this number predicted to rise to 8.5 million by the year 2030. This research addresses the multi-class classification challenge of categorizing Alzheimer's disease into three categories: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and Dementia, utilizing clinical and imaging data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. We propose a novel stacking ensemble model, TriBoost-StackNet, that incorporates Random Forest(RF), XGBoost, and LightGBM as foundational learners, with Logistic Regression (LR) functioning as the meta-classifier. The model attained a classification accuracy of 97.61%, markedly surpassing current methodologies in recent research. Our critical disease prediction study provides joint MMSE + DX forecasting and also supports interpretability via SHAP. Longitudinal forecasting of Mini-Mental State Examination (MMSE) scores and clinical diagnoses (DX) at 12, 18, 24, and 36 months was performed by modeling patient trajectories as variable-length longitudinal sequences. The XGBoost model was assessed using 5-fold cross-validation and demonstrated strong MMSE prediction, with an R² of 0.87 at 24 months and effective DX classification, achieving 83.79% accuracy at 12 months. This study illustrates that interpretable, tree-based ensemble models can achieve superior performance in predicting AD stages with minimal preprocessing, presenting a viable approach for clinical use.