Article Text

## Abstract

**Introduction** Patients with diabetes mellitus are risk of premature death. In this study, we developed a machine learning-driven predictive risk model for all-cause mortality among patients with type 2 diabetes mellitus using multiparametric approach with data from different domains.

**Research design and methods** This study used territory-wide data of patients with type 2 diabetes attending public hospitals or their associated ambulatory/outpatient facilities in Hong Kong between January 1, 2009 and December 31, 2009. The primary outcome is all-cause mortality. The association of risk variables and all-cause mortality was assessed using Cox proportional hazards models. Machine and deep learning approaches were used to improve overall survival prediction and were evaluated with fivefold cross validation method.

**Results** A total of 273 678 patients (mean age: 65.4±12.7 years, male: 48.2%, median follow-up: 142 (IQR=106–142) months) were included, with 91 155 deaths occurring on follow-up (33.3%; annualized mortality rate: 3.4%/year; 2.7 million patient-years). Multivariate Cox regression found the following significant predictors of all-cause mortality: age, male gender, baseline comorbidities, anemia, mean values of neutrophil-to-lymphocyte ratio, high-density lipoprotein-cholesterol, total cholesterol, triglyceride, HbA1c and fasting blood glucose (FBG), measures of variability of both HbA1c and FBG. The above parameters were incorporated into a score-based predictive risk model that had a c-statistic of 0.73 (95% CI 0.66 to 0.77), which was improved to 0.86 (0.81 to 0.90) and 0.87 (0.84 to 0.91) using random survival forests and deep survival learning models, respectively.

**Conclusions** A multiparametric model incorporating variables from different domains predicted all-cause mortality accurately in type 2 diabetes mellitus. The predictive and modeling capabilities of machine/deep learning survival analysis achieved more accurate predictions.

- epidemiology
- risk factors

## Data availability statement

Data are available in a public, open access repository. Data are available on reasonable request. An anonymized version of the dataset has been deposited on Zenodo (https://zenodo.org/record/4383385), in fully compliance with University Regulations and Policy on Dataset Deposit and Sharing. For additional information: https://libguides.lib.cuhk.edu.hk/RDM/dataset_deposit.

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/licenses/by/4.0/.

## Statistics from Altmetric.com

### Significance of this study

#### What is already known about this subject?

Increased variability in metabolic parameters is predictive of higher mortality in type 2 diabetes mellitus.

#### What are the new findings?

We developed a machine learning-driven predictive risk model for type 2 diabetes mellitus using multiparametric approach with data from different domains.

Measures of variability of fasting glucose and HbA1c show similar predictive power for all-cause mortality, regardless of whether adjustments were made for initial values or mean values across follow-up.

A multiparametric predictive risk model incorporating variables from different domains, including baseline demographics, comorbidities and laboratory tests, measures of variability of HbA1c and fasting blood glucose predicted all-cause mortality accurately.

Machine learning-driven algorithms further improved the accuracy of the predictive models.

#### How might these results change the focus of research or clinical practice?

A simple, easy-to-use score-based system has been devised to enable rapid risk prediction in the clinical setting.

## Introduction

Type 2 diabetes mellitus is one of the most common metabolic conditions, with an increasing prevalence attributable to aging, sedentary lifestyles, environmental changes and better disease management.1–3 Patients with this condition are at an increased risk of premature death and other complications.4 5 Existing risk models have been developed, such as QDiabetes for predicting new onset diabetes,6 and CORE,7 BRAVO8 and Michigan9 models for predicting disease progression, complications and mortality. These have generated good predictive results in western cohorts but are limited by their direct applicability to Asian populations. For example, Chinese patients have a lower body mass index threshold for diabetes development and have a higher propensity to suffer from chronic kidney disease as a result.10 11 While Asian population-specific models are available,12–15 these have generally not incorporated temporal measures of variability for longitudinal data or machine learning approaches, both of which can enhance risk prediction.16 17 Indeed, with the rapid development of big data analytics, it has become easier to improve discrimination by analyzing complex interactions among variables. Previously, a machine learning-driven approach has demonstrated superior performance for predicting diabetes onset in a Chinese cohort.18

In this territory-wide study, with the aid of machine/deep learning approaches, we developed a risk model for mortality prediction using multiparametric data from different domains. These include baseline comorbidities, measures of variability of fasting glucose and HbA1c, inflammatory and nutritional indices and drug prescription details. We tested the hypothesis that machine learning methods (random survival forests, RSF19) and deep neural survival learning models (DeepSurv20) can significantly improve predictive performance when compared with Cox regression-based models.

## Methods

### Study design and data source

The inclusion criteria were patients who received antidiabetic medications or had International Classification of Disease, Ninth Edition (ICD-9) codes for type 2 diabetes mellitus, and attended any of the 43 public hospitals or their associated ambulatory or outpatient facilities managed by the Hong Kong Hospital Authority between January 1 and December 31, 2009. The Clinical Data Analysis and Reporting System, a healthcare database that integrates patient information to establish comprehensive medical records with accurately linked mortality data were used in this study. This system has been used for epidemiological research by multiple research teams, including our team, in the past,21–24 including model development studies.25 26

### Data extraction

Baseline patient characteristics include demographic details such as age and sex, prior comorbidities (heart failure (HF), ischemic heart disease (IHD), ischemic stroke, aborted sudden cardiac death (SCD) of all causes including acute myocardial infarction, atrial fibrillation (AF), peripheral vascular disease, intracranial hemorrhage, osteoporosis, dementia, hypertension, chronic obstructive pulmonary disease (COPD), cancer, renal and ophthalmological diabetic complications), antidiabetic and cardiovascular medications. The ICD-9 codes for the aforementioned comorbidities are summarized in online supplemental table 1. The duration of living with type 2 diabetes mellitus from the point of diagnosis until December 31, 2009 was also extracted, and determined by the earliest fulfillment of any of the following criteria in this order: (1) initial documentation of type 2 diabetes mellitus related ICD-9 codes; (2) earliest HbA1c>6.5%; (3) earliest fasting blood glucose (FBG) >7 mmol/L. Time-till all-cause mortality was determined as the number of days from the starting date of patient inclusion, January 1, 2009, until the day of death or the end of the follow-up period, December 31, 2019.

### Supplemental material

The following laboratory data were collected at baseline: neutrophil-lymphocyte ratio (NLR) was derived by dividing the absolute neutrophil by the lymphocyte count, anemia defined as <13 g/dL for men and <12 g/dL for women, biochemical test results including (1) creatinine, sodium, potassium, (2) urea, (3) albumin and total protein, (4) alanine aminotransferase and alkaline phosphatase, (5) FBG and HbA1c; (6) high-density lipoprotein-cholesterol (HDL-C), directly measured low-density lipoprotein-cholesterol (LDL-C), total cholesterol, and triglyceride.

The number of antidiabetic drugs by class were extracted: (1) insulin, (2) biguanide, (3) sulphonylurea, (4) alpha-glucosidase inhibitor, (5) thiazolidinedione, (6) dipeptidyl peptidase-4 inhibitor, (7) glucagon-like peptide receptor-1 agonist, (8) meglitinide. Similarly, the number of antihypertensive medications of the following classes were also extracted: (1) angiotensinogen-converting-enzyme inhibitor/angiotensin receptor blocker, (2) beta-adrenergic receptor blocker, (3) calcium channel blocker, (4) diuretics. Lipid-lowering agents were also extracted.

### Variability calculations

To calculate FBG and HbA1c variability, data points were obtained for the period between January 1, 2004 and December 31, 2008. Only patients with three or more measurements for the specific parameter were included for the variability analysis of the respective parameter. The different measures are detailed below and summarized in online supplemental table 2: (1)SD, (2) absolute variability score defined as 100× no. of measurements>0.5/no. of measurements, (3) percentage variability score defined as 100× no. of measurements>10% of previous measurement/no. of measurements, (4) normalized absolute variability score given by (2)/individual mean, (5) normalized percentage variability score given by (3) /individual mean, (6) SD/individual baseline, (7) coefficient of variation given by SD/individual mean, (8) variability independent of mean given by SD/individual meanˆ(ln(population SD)/ln(population mean)).

### Outcomes and statistical analysis

The primary outcome for the present student is all-cause mortality. Univariate Cox regression was applied to identify significant predictors for all-cause mortality and HR with 95% CI were reported. Variables achieving p<0.10 were included in a diabetes duration-adjusted multivariate model. Statistical significance is defined as p<0.05. FBG and HbA1c variability of the same formula were paired and added to the multivariate model to assess their predictiveness through comparison of HR, thus preventing problems with collinearity.

To generate a predictive score, Cox regression was repeated for the final multivariate model with measures of variability included. HR between 1 and 1.50 was awarded 1 mark in the score. To adjust for the U-shaped relationship against mortality reported for HDL-C, LDL-C, total cholesterol and HbA1c, these parameters were first divided by deciles to serve as cut-offs and undergo univariate Cox regression. Thereafter, the decile with the smallest HR was selected as the reference and compared against the remaining deciles through univariate Cox regression again. The minimum and maximum cut-offs for the deciles that had an insignificant difference with the reference decile were selected as the cut-offs to be used in the score. To demonstrate the U-shaped relationship, the HR of deciles was plotted graphically. Similar methods were employed to illustrate the U-shaped relationship by existing studies.27–29 Cut-off values for continuous variables in the score were found through maximizing sensitivity and specificity. Age and diabetes duration were rounded to the nearest whole number, while other parameters were rounded to two decimal points. The predictive value of the score was evaluated through the generation of a receiver operating characteristic (ROC) curve and area under the-curve (AUC) calculated.

To further evaluate the predictive value of the measures of variability, the measures were also divided into quartiles, with the first quartile as a reference, to perform univariate Cox regression and assess the AUC of the quartile cut-offs. The quartile HR of the FBG and HbA1c measures of variability were illustrated graphically. Statistical analyses were performed using RStudio software (V.1.1.456) and Python (V.3.6).

### Development of machine/deep models for survival learning

Machine/deep learning survival analysis models can directly capture the relationships between risk predictors and mortality outcome without prior functional assumptions typically made in Cox analysis models. Here we used an RSF model, a type of machine learning method for survival analysis, relying on the intuition that the best survival learning model, when combined with weak decision tree learning models, can minimize the overall survival prediction errors. The prediction errors are measured by performance evaluators, for example, precision, recall, AUC and C-index. The out-of-bag (OOB) method was adopted whenever a bootstrap sample (bag ones) is down with replacement from the training dataset. The bootstrapping technique is used to grow the tree and results in well-defined subsets. Some of the bootstrap are duplicates and are members of the in-bag subset, and the remaining individuals define the OOB subset for the final tree. Each individual in the OOB subset for a tree is passive. A unique terminal node membership and terminal node statistic were assigned. An OOB ensemble statistic for each individual is formed by combining the terminal node statistics from all trees where an individual is an OOB member. Finally, the class with the maximum frequency in the OOB ensemble statistic serves as the predicted class label for the member. More detailed descriptions of these concepts were described by Breiman *et al*.30

The variable’s importance of interest is calculated as the prediction error (squared loss) of the original ensemble event-specific cumulative probability function subtracted from the prediction error of the original ensemble event-specific cumulative probability function (obtained when each OOB instance is just dropped down its in-bag competing risks tree).31 32 In this study, RSF was used for mortality prediction and the most important predictors were ranked according to variable importance measure in RSF. Variables that were important predictors of risk outcome have a larger importance value, indicating higher predictive strength, whereas non-predictive variables have zero or negative values.

We further employed a nonlinear deep learning survival method termed Cox proportional hazards DeepSurv approach. This can inherently and adaptively model the high-level interaction patterns among risk predictors and thus can better capture the complex nonlinear relationship between patients’ covariates (eg, clinical features) and mortality outcome directly. Specifically, DeepSurv is a deep feed-forward neural network that can predict the effects of a patient’s baseline covariates on their hazard rate parameterized by the weights of the neural network. The input of DeepSurv is the baseline variables of the patient with diabetes. The hidden layers of DeepSurv consist of a fully connected layer of nodes, followed by a dropout layer.33 The output of the DeepSurv is a single node with a linear activation which estimates the log-risk function in the Cox model. In this study, we train DeepSurv by presetting the objective function to be the average negative log form of Cox partial likelihood with L2-regularization,34 in order to model for mortality risk prediction of the patients with diabetes. Gradient descent optimization was used to find the weights of DeepSurv. The hyperparameters of DeepSurv including the number of hidden layers, the number of nodes in each layer and dropout probability were determined from a random hyperparameter search approach.35

The RSF, DeepSurv and multivariate Cox regression models adopted the same set of predictors. A fivefold cross-validation approach was performed to compare the survival prediction performance of RSF and DeepSurv in terms of precision, recall, AUC and HC-index over the standard Cox model. The R packages, *randomForestSRC* (V.2.9.3), *ggplot2* (V.3.3.2), and python package *DeepSurv* (V.0.1.0) were used to generate the mortality prediction results.

## Results

### Baseline characteristics

The study cohort included 273 876 patients (mean age: 65.4±12.7 years, male: 48.2%, diabetes duration=6.18 ± 4.56 years) with a median follow-up of 142 (IQR (IQR)=106–142) months, which corresponded to a total of 2 660 465 patient-years. The baseline demographics, clinical, laboratory and drug details are shown in tables 1 and 2 for continuous and discrete variables, respectively. The most prevalent comorbidities were hypertension, IHD and HF. The percentage of patients on n=0, 1, 2, 3, and 4 antidiabetic medications were 13.3%, 34.8%, 46.1%, 5.4% and 0.4%, respectively. At baseline, the fasting glucose and HbA1c were 8.02±1.95 mmol/L and 7.75%±2.59%, respectively. The median number for fasting glucose and HbA1c measurements were 7 (IQR=4–11) and 7 (IQR=4–10), respectively. The different measures of variability for fasting glucose or HbA1c are quantified for subsequent use to predict mortality (detailed methodology is shown in online supplemental table 2.

### Predictors of all-cause mortality

Over a median follow-up period of 142 (IQR=106–142) months, 91 155 deaths were recorded (33.3%), which corresponded to an annualized mortality rate of 3.43%. The significant univariate predictors for all-cause mortality are presented in table 3. All measures of variability for FBG and HbA1c were significant predictors as well. The graphical comparison of HR from quartile cut-offs of FBG and HbA1c variability predictors are shown in online supplemental figure 1A,B, with the details summarized in online supplemental tables 3 and 4.

The following parameters remained significant predictors following multivariate adjustment (table 4): (1) age and male gender, baseline comorbidities or complications (hypertension, HF and AF, COPD, cancer, dementia, ischemic stroke, intracranial hemorrhage, aborted SCD, diabetic renal and ophthalmological complications), (2) laboratory tests (anemia, neutrophil-to-lymphocyte ratio (NLR); HDL-C, total cholesterol, triglyceride; mean HbA1c and mean FBG), (3) eight different measures of variability for HbA1c and FBG (table 4). A U-shaped relationship between HDL-C, LDL-C, total cholesterol (figure 1A–C), but not for triglyceride (figure 1D) and all-cause mortality. A U-shaped relationship was also observed for HbA1c but not for FBG (figure 1E,F).

### Development of a score-based predictive risk model based on Cox regression

A score-based predictive risk model for all-cause mortality was developed by incorporating significant predictors from multivariate analysis. One point was allocated for each significant predictor where the HR was less than 1.5, and 2 points for HRs between 1.5 and 2.5. Out of the eight measures of variability for HbA1c and FBG, SD had the highest HR and greatest statistical significance when adjusted to the multivariate model (FBG: HR=1.08, 95% CI 1.07 to 1.10, p<0.0001; HbA1c: HR=1.11, 95% CI 1.07 to 1.14, p<0.0001). It was therefore selected to be included in the mortality score. Altogether, the predictive risk model had a total score out of 25 (table 5). ROC analysis was performed, demonstrating the AUC of 0.729 (95% CI 0.727 to 0.731; online supplemental figure 2). Kaplan-Meier curve and the Kaplan-Meier curve stratified by male gender are shown in online supplemental figure 3 for patients with diabetes. The survival curve generated by the multivariate Cox regression model is shown in online supplemental figure 4.

### Results of machine/deep learning approaches for risk modeling

A RSF model was further performed to predict mortality outcome. The optimal tree number of the RSF model selected as 400 using a fivefold cross-validation approach to minimize the overall squared error rate in the testing set is shown in online supplemental figure 5. In addition, as shown in online supplemental figure 6 about the detailed main results of using the RSF model to predict the mortality outcome of patients with diabetes, the overall ensemble survivals (top left panel) are indicated by the red line and the Nelson-Aalen estimator is given by the green line. Brier score (0=perfect, 0.25>worse than guessing) stratified by ensemble mortality based on the inverse probability of censoring weight method is shown in the top right panel. We stratify the cohort into four groups of 0–25, 25–50, 50–75 and 75–100 percentile mortality (the overall, non-stratified, Brier score is shown by the red line). Continuous rank probability score given by the integrated Brier score divided by time is shown in the bottom left panel, while the illustration of mortality of each patient with diabetes versus observed time of mortality event was shown in the bottom right panel. The mortality events are shown as blue points, and we indicated censored observations using red points. Predicted OOB survivals and the cumulative hazard using the RSF model are shown in online supplemental figure 7. The predicted survival curves of patients with diabetes via the RSF model are shown in online supplemental figure 8 where blue curves correspond to censored observations while red curves represent the observations experiencing mortality events. The 10 most important predictors ranked by the RSF model are shown in online supplemental table 5.

Finally, we compared the survival analysis performance of the RSF model and DeepSurv as typical machine learning and deep learning approaches, respectively, over multivariate Cox model to predict the mortality outcome of the patients with diabetes using the fivefold cross-validation method. Sobol solver36 was used to sample each hyperparameter of DeepSurv from a predefined range and k-means cross-validation (k=3) was used to evaluate the performance of the parameter configuration settings. For k-means cross-validation, the dataset was split into k subsets with one subset used as the test set and the remaining as the training set to measure the prediction error. The role of the test and training set was switched until all subsets have been used as the test set, and a mean prediction error would be derived.37 Using the configuration with the largest validation C-index on the testing set to avoid models that overfit, we selected the best hyperparameters of the DeepSurv network which included: number of dense layers=4, learning rate=0.0003, ℓ2 regularization coefficient=3.25, dropout rate=0.36, exponential learning rate decay constant=0.0005 and momentum=0.86. In all instances, the ReLU activation function was applied.38

The comparative performance results of the different models are shown in table 6. Both RSF and DeepSurv models significantly outperform the multivariate Cox model (precision: 0.85 (95% CI 0.81 to 0.89), recall: 0.86 (0.82 to 0.89), AUC: 0.85 (0.82 to 0.91), C-index: 0.86 (0.81 to 0.90) for DeepSurv model, while precision: 0.85, recall: 0.87, AUC: 0.86, C index: 0.87 for RSF model) based on the same validation inputs of the risk predictors (P for trend <0.001). In addition, the Cox score (precision: 0.88 (0.83 to 0.92), recall: 0.87 (0.84 to 0.91), AUC: 0.86 (0.82 to 0.89), C-index: 0.87 (0.84 to 0.91)) demonstrated better performance than multivariate Cox model (precision: 0.75 (0.72 to 0.79), recall: 0.73 (0.67 to 0.77), AUC: 0.73 (0.68 to 0.76), C-index: 0.73 (0.66 to 0.77)). The advantages of machine/deep learning approaches over the Cox model arise from the fact of their strength to describe survival data with both linear and nonlinear effects from covariates. However, it should be noted that in comparison to DeepSurv, RSF allows influential predictors to be identified more easily by generating an ‘importance ranking’ of the variables with standard bootstrap theory. This enables the investigation of the predictive strength of associated risk predictors for clinicians to estimate the mortality probability just referring to the most important variables.

## Discussion

In this study, we developed a machine learning-driven predictive risk model for type 2 diabetes mellitus using a multiparametric approach with data from different domains. Our novel findings are that (1) measures of variability of fasting glucose and HbA1c show similar predictive power for all-cause mortality, regardless of whether adjustments were made for initial values or mean values across follow-up; (2) a multiparametric predictive risk model incorporating variables from different domains, including baseline demographics, comorbidities and laboratory tests, measures of variability of HbA1c and FBG predicted all-cause mortality accurately and (3) machine learning-driven algorithms further improved the accuracy of the predictive models.

Numerous factors have been associated with premature mortality in patients with type 2 diabetes mellitus. Prior epidemiological studies have identified key risk factors including age, comorbidities, healthcare utilization patterns and laboratory findings.39 40 In our study, we also identified similar predictors that included advanced age, male gender, high neutrophil and low lymphocyte count, increased levels of urea, creatinine and potassium, as well as reduced levels of HDL-C, LDL-C, triglycerides, total cholesterol and sodium. Moreover, U-shaped relationships between LDL-C, HDL-C and total cholesterol were found in our cohort. These findings are in keeping with U-shaped relationships between cholesterol and all-cause mortality41 and for LDL-C42 in the general Korean populations. Similar relationships were found for HDL-C, where extremely high LDL-C levels were paradoxically associated with higher mortality.43 The association between all-cause mortality and elevated creatinine, urea and potassium, which are classic features of renal failure, is supported by evidence suggesting that the Asian population has a higher risk of developing diabetic nephropathy compared with Caucasians.11 It is widely accepted that current predictive models that have largely been developed using Western cohorts only provide moderate levels of accuracy and at times do not lend themselves relevant to disease management protocols that vary by country. Development of country/territory-specific risk prediction models allows for local population-based confounders and clinician management approaches to be incorporated into these models thus providing a more accurate risk prediction for the local population.

Diabetes mellitus is characterized by the presence of systemic chronic inflammation, which is accompanied by increased oxidative stress. To quantify the degree of inflammation, the NLR has been used as a surrogate measure, as it reflects the balance between proinflammatory and anti-inflammatory pathway activation. In our cohort, we found that raised NLR was associated with all-cause mortality risk. We extend previous findings of our group and other groups that increased NLR has been associated with insulin resistance in patients with newly diagnosed type 2 diabetes,44 the progression of diabetic nephropathy45 and complications in diabetes.46 Consequently, the increased oxidative stress environment in diabetes can induce adverse remodeling of the heart, which in turn increases the risk of HF, arrhythmias and cardiovascular mortality.47 48

Glycemic variability refers to the fluctuations in glucose levels and can be measured as a daily variation or variation between different clinical visits.49 50 Similarly, variability in HbA1c levels has been quantified. Both measures have been associated with a higher risk of complications and mortality in patients with diabetes mellitus in both randomized controlled trials and real-world settings.51–55 There are several methods that can be used to calculate variability, such as SD, CV and score based on the frequency exceeding a fixed percentage change in the absolute values. Prior studies have demonstrated the importance of such measures of variability in the prediction of adverse outcomes,16 17 but a systematic and direct comparison of different methodologies has not been made with regard to their predictive performance. In our study, eight different measures of variability for HbA1c and FBG were compared, all of which showed significant predictive values. Our findings illustrate that temporal variability in these laboratory tests is important, regardless of the methodology employed for its calculation. In our study, we also found that mean FBG did not predict mortality. Instead, all of the different measures of its variability were all predictive, suggesting that it is intermittent poor glucose control rather than chronic hypoglycemia that are more closely associated with all-cause mortality.

Standard survival model such as Cox proportional hazards model is a semiparametric analysis model to calculate the effects of observed patient’s covariates on the mortality risk outcome. The Cox model assumes the effect of each covariate is proportional. However, in many practical applications, the assumption is not true and risks losing decision information among the observed patient’s covariates. Furthermore, it cannot account for the presence of U-shaped relationships as only a single HR is derived for each covariate. Therefore, numerous nonlinear survival models were developed to better fit survival data with nonlinear log-risk functions (eg, time-encoded methods56) or learning the nonlinear relationship directly using machine learning and deep learning techniques (eg, feed-forward neural network risk-predicting methods57). RSF model19 that is constructed by an ensemble of binary decision trees has been identified as an alternative approach to Cox proportional hazard model in analyzing time-to-event survival data when the linear proportional hazard assumption is violated. DeepSurv58 whose multilayer perceptron architecture is deeper than Faraggi-Simon’s feed-forward model and minimizes the negative log Cox partial likelihood with a risk not necessarily linear, is capable to efficiently learn complex non-linear relationships between patient’s covariates and mortality outcome. For model selection among traditional Cox model, the Cox-based score model, RSF, and DeepSurv in risk prediction tasks, there exists a tradeoff: (1) traditional Cox models (as well as Cox based score models) provide good model interpretation ability but less accurate predictions since they sacrificed the consideration of nonlinear inter-dependent patterns among the variables; (2) machine learning or deep learning-based models significantly improves prediction performance especially when the size of instance cohort is rather large (n>1000) but some (eg, DeepSurv) may not provide good interpretations about the resulting predictions. Prediction accuracy and model interpretability are the two most important considerations for risk prediction model selection for clinical use. This study demonstrates the superiority of adopting RSF model for the risk prediction due to both its highest prediction accuracy and good model interpretability.

The findings of this study illustrate that machine/deep survival learning models can better capture the highly complex and nonlinear relationships between prognostic variables and an individual patient’s risk of mortality without prior variable selection or domain knowledge, compared with the traditional Cox analysis model. Application of machine/deep learning to survival analysis performs much better than the standard Cox model in predicting mortality risk of patients with diabetes mellitus. Additionally, machine/deep survival learning models will enable clinicians to provide personalized survival estimations based on the computed probability of mortality risk. In practice, medical researchers can use machine/deep survival learning models to improve overall survival prediction performance based on prognostic characteristics of the patients with diabetes mellitus and subsequently inform early efficient treatment options and even reduce mortality risk.

### Strengths and limitations

The following strengths of our study should be noted. First, this was a territory-wide study with large patient numbers with complete and long follow-up of mortality over 10 years, owing to the linkage of the electronic health records to the death registry. Second, the availability of different data types including prior comorbidities, laboratory test results that included longitudinal data and drug details meant that we were able to build a comprehensive risk model for accurate prediction. Third, the application of the latest machine learning techniques was able to further improve the risk predictions of the models.

However, there are some limitations that should be noted. First, this was a retrospective study and therefore carries the potential bias, such as information bias, that is found in all studies of this type. Second, as with all studies using administrative databases, undercoding is a possibility. This was nevertheless mitigated by our definition of diabetes to include patients with the appropriate ICD coding and those who were on any diabetic medication or met the criteria of diabetes by either HbA1c or fasting glucose results. Patients with type 1 diabetes mellitus were not included given a different disease course and pathogenesis. Further research is needed to explore the potential for the present findings to be extrapolated onto patients with type 1 diabetes mellitus. Third, although the deep neural network survival learning approach demonstrates significant potential in providing much more accurate predictions, the model’s weak interpretability becomes the main obstacle for its real application in clinical practices. Investigations of developing interpretable deep survival learning models that provide highly accurate predictions with supportive explanations for patients with diabetes mellitus become our next research concentration.

## Conclusion

A multiparametric model incorporating variables from different domains predicted all-cause mortality accurately in type 2 diabetes mellitus and a machine/deep learning-driven approach provided further improvements for risk prediction.

## Data availability statement

Data are available in a public, open access repository. Data are available on reasonable request. An anonymized version of the dataset has been deposited on Zenodo (https://zenodo.org/record/4383385), in fully compliance with University Regulations and Policy on Dataset Deposit and Sharing. For additional information: https://libguides.lib.cuhk.edu.hk/RDM/dataset_deposit.

## Ethics statements

### Ethics approval

The study was approved by The Joint Chinese University of Hong Kong—New Territories East Cluster Clinical Research Ethics Committee.

## References

## Supplementary materials

## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

Contributors SL, JZ: data analysis, data interpretation, statistical analysis, manuscript drafting, critical revision of manuscript. KSKL, WTW, ICKW, TL, WKKW, KJ: project planning, data acquisition, data interpretation, critical revision of manuscript. QZ, GT: study conception, study supervision, project planning, data interpretation, statistical analysis, manuscript drafting, critical revision of manuscript.

Funding Health and Medical Research Fund of Hong Kong Food and Health Bureau: 16 171 991 (to QZ).

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.