Objective: Accurate prediction of pathological complete response (pCR) following neoadjuvant therapy (NAT) is critical for surgical risk stratification and treatment decision-making. This study develops and validates an interpretable machine learning (ML) model integrating computed tomography (CT) body composition, inflammatory, and nutritional indicators to predict pCR in breast cancer patients after NAT.
Methods: Retrospective data from 189 breast cancer patients (January 2019–June 2023) at Jiangxi Cancer Hospital were analyzed, including CT-based body composition parameters and blood test variables. After variable selection, eight ML algorithms were used to construct models. The optimal model was identified through receiver operating characteristic (ROC) curve analysis, and its generalizability was verified using the test set; accuracy and utility were evaluated using calibration/decision curve analysis, and Shapley additive explanation (SHAP) analysis visualized individualized predictions.
Results After multivariable adjustment, HR-negative/HER2-negative subtypes, visceral adipose tissue density, skeletal muscle density, albumin-to-alkaline phosphatase ratio, systemic inflammation response index, and intramuscular adipose tissue content were associated with pCR. The XGBoost model performed best, achieving an ROC Area Under the Curve (AUC) of 0.882 (95% CI: 0.756–0.996) in internal validation, and an AUC of 0.845 (95% CI: 0.735–0.955) in the test set. Brier score was 0.184. Decision curve analysis showed a favorable net benefit, and SHAP clarified key factors.
Conclusions Interpretable ML model using CT body composition combined with inflammatory and nutritional indicators effectively predicts pCR after breast cancer NAT, supporting clinical decision-making and potentially improving prognosis.
If you have any questions about submitting your review, please email us at [email protected].