目的 构建与验证基于可解释性机器学习的老年糖尿病患者衰弱预测模型,以早期识别高风险患者。方法 采用便利抽样法,选择 2024 年 1 月至 5 月本市某三级甲等综合医院住院的 232 例老年糖尿病患者作为研究对象。227 例患者完成研究,按照 7 ∶ 3 的比例随机分为训练集(158 例)与测试集(69 例),分别用于模型构建与验证。采用最小绝对收缩和选择算子(least absolute shrinkage and selection operator,LASSO)回归与 Boruta 算法筛选特征变量,并基于逻辑回归(logistic regression,LR)、支持向量机(support vector machine,SVM)和极端梯度提升树(extreme gradient boosting,XGBoost)构建机器学习模型。通过曲线下面积(area under curve,AUC)、灵敏度、特异度、F1 分数等指标评估模型性能,并通过 DeLong 检验比较模型间的 AUC差异。最优模型利用沙普利加和解释(Shapley additive explanation,SHAP)方法,对关键预测因子进行解释,并基于 Streamlit 开发网页计算器,实现模型可视化。结果 227 例老年糖尿病患者中 99 例合并衰弱(43.6%)。XGBoost 模型综合表现最优,在训练集和测试集中,DeLong 检验显示 XGBoost 的 AUC 高于 LR 和 SVM(均 P<0.001)。训练集 AUC 为 0.920,准确性为 0.842,灵敏度为 0.783,特异度为 0.887,阳性预测值(positive predictive value,PPV)为 0.845,阴性预测值(negative predictive value,NPV)为0.840,F1 分数为 0.810。测试集 AUC 为 0.806,准确性为 0.681,灵敏度为 0.633,特异度为 0.743,PPV 为 0.731,NPV 为 0.744,F1 分数为 0.620。SHAP 可解释分析显示,衰弱的预测因子重要性排序依次为:认知障碍、查尔斯共病指数、慢性疼痛、体育锻炼量、肌少症、营养状态、糖尿病肾病。结论 基于 SHAP 可解释 XGBoost 的衰弱预测模型可有效识别老年糖尿病患者的衰弱高风险因素,能为其健康管理策略提供支持。
Objective To develop and validate a machine learning-based frailty prediction model for older adults with diabetes, hence to enable early identification of the patients in high-risk of frailty. Methods With convenience sampling,232 elderly inpatients with diabetes who were admitted to a Tier-IIIA hospital between January and May 2024 were recruited. The 227 patients who completed all assessments were randomly assigned into a training set(n = 158)for model development and a testing set(n = 69)for validation with 7:3 split. Feature selection was performed using Least absolute shrinkage and selection operator(LASSO)regression and the Boruta algorithm. Three machine learning models were developed,including logistic regression(LR),support vector machine(SVM),and extreme gradient boosting(XGBoost). Model performance was evaluated using the area under the curve(AUC),sensitivity,specificity, F1 score,and other indicators,with DeLong’s test used to compare differences in AUCs. The optimal model was interpreted using SHapley additive exPlanation(SHAP),and a web-based calculator was developed via the Streamlit framework. Results Among the 227 patients,99 were classified as frail,yielding a prevalence of 43.6%. The XGBoost model demonstrated the best overall performance. DeLong’s test showed that the AUC of XGBoost was significantly higher than those of LR and SVM(P<0.05). In the training set,the XGBoost model achieved an AUC of 0.920,accuracy of 0.842,sensitivity of 0.783,specificity of 0.887,positive predictive value(PPV) of 0.845,negative predictive value(NPV)of 0.840,and F1 score of 0.810. In the testing set,the XGBoost yielded an AUC of 0.806, accuracy of 0.681,sensitivity of 0.633,specificity of 0.743,PPV of 0.731,NPV of 0.744,and F1 score of 0.620. SHAP analysis ranked the most influential predictors of frailty in an order of cognitive impairment,Charlson Comorbidity Index,chronic pain,physical activity, sarcopenia,nutritional status and diabetic nephropathy. Conclusion The explainable XGBoost frailty prediction model developed in this study effectively identifies high-risk older inpatients with diabetes and can help to optimise their health management strategies.





