Derivation and external validation of a risk prediction algorithm to estimate future risk of cardiovascular death among patients with type 2 diabetes and incident diabetic nephropathy: prospective cohort study

Objective To derive, and externally validate, a risk score for cardiovascular death among patients with type 2 diabetes and newly diagnosed diabetic nephropathy (DN). Research design and methods Two independent prospective cohorts with type 2 diabetes were used to develop and externally validate the risk score. The derivation cohort comprised 2282 patients with an incident, clinical diagnosis of DN. The validation cohort includes 950 patients with incident, biopsy-proven diagnosis of DN. The outcome was cardiovascular death within 2 years of the diagnosis of DN. Logistic regression was applied to derive the risk score for cardiovascular death from the derivation cohort, which was externally validated in the validation cohort. The score was also estimated by applying the United Kingdom Prospective Diabetes Study (UKPDS) risk score in the external validation cohort. Results The 2-year cardiovascular mortality was 12.05% and 11.79% in the derivation cohort and validation cohort, respectively. Traditional predictors including age, gender, body mass index, blood pressures, glucose, lipid profiles alongside novel laboratory test items covering five test panels (liver function, serum electrolytes, thyroid function, blood coagulation and blood count) were included in the final model. C-statistics was 0.736 (95% CI 0.731 to 0.740) and 0.747 (95% CI 0.737 to 0.756) in the derivation cohort and validation cohort, respectively. The calibration slope was 0.993 (95% CI 0.974 to 1.013) and 1.000 (95% CI 0.981 to 1.020) in the derivation cohort and validation cohort, respectively. The UKPDS risk score substantially underestimated cardiovascular mortality. Conclusions A new risk score based on routine clinical measurements that quantified individual risk of cardiovascular death was developed and externally validated. Compared with the UKPDS risk score, which underestimated the cardiovascular disease risk, the new score is a more specific tool for patients with type 2 diabetes and DN. The score could work as a tool to identify individuals at the highest risk of cardiovascular death among those with DN.


ABSTRACT
Objective To derive, and externally validate, a risk score for cardiovascular death among patients with type 2 diabetes and newly diagnosed diabetic nephropathy (DN). Research design and methods Two independent prospective cohorts with type 2 diabetes were used to develop and externally validate the risk score. The derivation cohort comprised 2282 patients with an incident, clinical diagnosis of DN. The validation cohort includes 950 patients with incident, biopsy-proven diagnosis of DN. The outcome was cardiovascular death within 2 years of the diagnosis of DN. Logistic regression was applied to derive the risk score for cardiovascular death from the derivation cohort, which was externally validated in the validation cohort. The score was also estimated by applying the United Kingdom Prospective Diabetes Study (UKPDS) risk score in the external validation cohort. Results The 2-year cardiovascular mortality was 12.05% and 11.79% in the derivation cohort and validation cohort, respectively. Traditional predictors including age, gender, body mass index, blood pressures, glucose, lipid profiles alongside novel laboratory test items covering five test panels (liver function, serum electrolytes, thyroid function, blood coagulation and blood count) were included in the final model. C-statistics was 0.736 (95% CI 0.731 to 0.740) and 0.747 (95% CI 0.737 to 0.756) in the derivation cohort and validation cohort, respectively. The calibration slope was 0.993 (95% CI 0.974 to 1.013) and 1.000 (95% CI 0.981 to 1.020) in the derivation cohort and validation cohort, respectively. The UKPDS risk score substantially underestimated cardiovascular mortality. Conclusions A new risk score based on routine clinical measurements that quantified individual risk of cardiovascular death was developed and externally validated. Compared with the UKPDS risk score, which underestimated the cardiovascular disease risk, the new score is a more specific tool for patients with type 2 diabetes and DN. The score could work as a tool to identify individuals at the highest risk of cardiovascular death among those with DN.

BaCkgROund
Diabetic nephropathy (DN) is one of the most significant complications of diabetes mellitus and the most frequent cause of end-stage renal disease. 1 Cardiovascular complications, induced by accelerating arteriosclerosis, comprise nearly 50% of all comorbidity and mortality in patients with type 2 diabetes and those with renal insufficiency caused by

Significance of this study
What is already known about this subject? ► Cardiovascular disease has been found to be the primary cause of death in people with diabetic nephropathy (DN). ► No risk scores had been developed and externally validated to predict short-term cardiovascular mortality in cohorts for patients with DN and type 2 diabetes.
What are the new findings? ► Using two independent prospective cohorts, a new risk score to predict cardiovascular death within 2 years since the diagnosis of DN was developed and externally validated with good discrimination and calibration. ► Unlike the new score, the UKPDS score substantially underestimated cardiovascular mortality.
How might these results change the focus of research or clinical practice? ► The score is derived based on routine clinical measurements that are commonly available for patients with DN either in an outpatient setting or in the inpatient setting. ► The score could work as a screening tool to identify individuals at the highest risk of cardiovascular death among those with DN.
Clinical care education/Nutrition diabetes have a yet greater risk of cardiovascular complications. 2 To reduce future cardiovascular mortality risk among patients with DN, risk algorithms predicting future individual absolute risk of cardiovascular mortality are required to help clinicians and patients to assess the management status and develop personalised care strategies. 3 In previous studies, a number of prognostic factors have been identified including microalbuminuria, 4 hypothyroidism, 5 osteopontin 6 and the metabolic syndrome. 7 However, few studies have calculated risk algorithms to identify those with DN, and those who are at particularly high risk of cardiovascular mortality, especially shortterm cardiovascular mortality.
The aim of the present study was to develop and validate a multivariable risk algorithm to predict cardiovascular deaths within 2 years of diagnosis of type 2 DN among patients with type 2 diabetes.

MeTHOds data source and study population
We used two independent prospective cohorts from the First Affiliated Hospital of Zhengzhou University, Henan, China: one (derivation) based on the electronic health record data from outpatient and inpatient registries to develop our cardiovascular mortality risk score and another (validation) based on biopsy registry cohort data for external validation. The diagnosis of type 2 diabetes was based on the American Diabetes Association criteria. 8

deRivaTiOn COHORT
A total 2282 patients in the derivation cohort were enrolled through the outpatient and inpatient departments (except Department of Nephrology) in the First Affiliated Hospital of Zhengzhou University. This is the largest hospital in China and provides both primary and secondary care to Henan province residents. To develop a risk score that could potentially be applied to the general DN population, all patients with incident (based on patients' previous medical records) DN clinically diagnosed in the hospital between 1 January 2015 and 31 December 2016 were enrolled as the derivation cohort. Each patient's clinical diagnosis date was recorded as the enrol date (baseline examination date). DN was defined as the presence of nephropathy in patients with type 2 diabetes and albuminuria >300 mg/g creatinine; or patients with diabetes and albuminuria >30-300 mg/g creatinine and an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m 2 . 9 external validation cohort The study included 950 patients with type 2 diabetes with biopsy-proven DN newly diagnosed between 1 January 2015 and 31 December 2016. Each patient's biopsyproven diagnosis date was recorded as the enroll date (baseline examination date). Patients were either outpatients or inpatients under the Department of Nephrology, the First Affiliated Hospital of Zhengzhou University. The diagnosis of DN was made based on histological characteristics, including glomerular hypertrophy, thickened capillary basement membranes, diffuse mesangial expansion (sclerosis), nodular mesangial sclerosis, exudative lesions such as capsular drop or fibrin cap, mesangiolysis, mescapillary microaneurysm or hyalinosis of afferent and efferent arterioles, using appropriate standards for renal biopsy including light microscopy, electron microscopy and immunofluorescence examination. 10 Patients with other glomerular disease concomitant with DN were excluded from this study. Renal biopsy was performed for precise diagnosis of renal lesions with the consent of each patient. The biopsy results were reviewed by three clinicians and the diagnosis was made only based on the agreement of more than two clinicians.
defining cardiovascular death Every patient included in both the derivation cohort and the validation cohort were followed up for 2 years following the baseline examination, for example, the follow-up time would be 5 January 2017 if the baseline examination was 6 January 2015. We defined the primary outcome as death with cardiovascular disease as the primary diagnosis over the 2-year follow-up. An outcome assessment committee at the First Affiliated Hospital of Zhengzhou University reviewed medical history and death certificates and determined the final underlying cause of death. Two clinicians in the committee independently verified the diagnosis, and discrepancies were adjudicated by discussion involving additional committee members. All clinicians in the committee were unaware of patient's baseline status.
Candidate predictors, missing data and power calculations Two demographic characteristics (age and gender), three clinical measurements (body mass index (BMI), systolic blood pressure (SBP) and diastolic blood pressure (DBP)) and six glucose and lipid profiles measurements (fasting glucose, hemoglobin A1c (HbA1c), total cholesterol, high-density lipoprotein (HDL) cholesterol, lowdensity lipoprotein (LDL) cholesterol, triglyceride) and 62 laboratory test items (covering full blood count, liver function and blood coagulation tests, serum electrolytes and renal function) routinely measured among patients with incident type 2 DN in both outpatient clinics and inpatient departments were initially identified. To minimize overfitting of the model and maintain the proper statistical power, all 62 laboratory test items were independently reviewed and combined into 16 integrated predictors (online supplementary table 1) based on the clinical utilization by two clinicians at the First Affiliated Hospital of Zhengzhou University. Discrepancies were adjudicated by discussion involving an additional clinician. In total, 27 potential predictors were selected for the subsequent model development.
On the basis of 275 cardiovascular deaths and all 27 potential predictors before backward elimination in our derivation cohort, we had an effective sample size of 10.2 cardiovascular deaths per parameter, above the minimum requirement suggested by Peduzzi et al. 12 ethical approval Written informed consent was obtained from all participants before inclusion.
Clinical care education/Nutrition Figure 1 Assessing calibration in the derivation cohort (left) and the validation cohort (right) for cardiovascular death.

statistical analysis for model derivation and external validation
We treated occurrence of cardiovascular deaths within 2 years of the beginning of follow-up, as binary outcome measures. For each of the 27 potential predictors (covering 73 items as described above), we used a univariable logistic regression model to calculate the unadjusted ORs. For derivation of the risk prediction model, we initially included all candidate predictors in a multivariable logistic regression model. We tried to use both fractional polynomial terms and binary terms (using medians of measurements in the derivation cohort as cut-offs) of continuous predictors (age, BMI, SBP, DBP, HbA1c, glucose, total cholesterol, HDL, LDL and triglyceride).
The term with best model fit statistics (minimum Bayesian information criterion (BIC)) in the full model was used as the term for above-mentioned continuous predictors. Through backward elimination, we excluded five predictors (covering 19 items) from the multivariate model as they were not statistically significant (p>0.1 based on change in log likelihood). 13 After elimination, we reinserted the excluded predictors into the final model to further check whether it became statistically significant. We also rechecked fractional polynomial terms at this stage and re-estimated them where necessary. Finally, 27 parameters (binary or polynomial terms) from 22 predictors remained in the final model.
We formed risk equations for predicting the log odds of cardiovascular death by using the estimated regression coefficients multiplied by the corresponding predictors included in our model together with the intercepts. This process ultimately led to equations for the predicted risk=1/(1+e −riskscore ), where the 'risk score' is the predicted log odds of cardiovascular death from the developed model.
To facilitate model utilization in clinical practice, the logistic regression equations were transformed into prognostic score charts. The coefficients in the logistic regression equation were multiplied by 12.5 and rounded to the nearest integer to obtain the prognostic score per predictor. Multiplication by 12.5 was chosen to place the majority of the coefficients close to an integer, thereby minimizing the effects of rounding. The sum of all prognostic scores reflects patients' probability of cardiovascular deaths. 14 We assessed the performance of the models in terms of the C-statistic and calibration slope (where 1.00 is ideal). The C-statistic represents the probability that for any randomly selected pair of people with DN with and without outcomes, the patient with the outcome had a higher predicted risk. 15 A value of o.50 indicated no discrimination and 1.00 represents perfect discrimination. We then undertook internal validation to correct measures of predictive performance for optimism (overfitting) by bootstrapping 100 samples of the derivation data. We repeated the model derivation process in each bootstrap sample to produce a model, applied the model to the same bootstrap sample to quantify apparent performance and applied the model to the original dataset to test model performance (calibration slope and C-statistic) and optimism (difference in the test performance and apparent performance). We then estimated the overall optimism across all models.
We applied our risk prediction model to each patient with DN in the external validation cohort on the basis of the presence of one or more predictors. We examined the performance of this final model both in the derivation dataset and then in the external validation dataset in terms of discrimination by calculating the C-statistic. We examined calibration by plotting agreement between predicted and observed risks across tenth of the predicted risks.
We also applied the United Kingdom Prospective Diabetes Study (UKPDS) risk score 16 in the external validation cohort to estimate the 2-year risk of cardiovascular disease (CVD) in terms of model calibration, to test whether this general risk score for patients with type 2 diabetes would still be suitable for patients with type 2 diabetes and DN. As the smoking information was not accessible in this study, estimates were made assuming both the lowest (all non-smoking) and highest (all smoking) scenarios.
We used Stata V. 15.0 for all statistical analyses. This study was conducted and reported in line with the Transparent Reporting of a multivariate prediction model for Individual Prediction Diagnosis guidelines. 17 ResulTs study participants In our derivation cohort, we analyzed information on 2282 patients with DN with 275 cardiovascular deaths within 2 years. Our validation cohort had information on 950 patients with DN with 112 cardiovascular deaths. Table 1 summarizes the basic characteristics and potential predictors of the study population. For general characteristics, patients in the derivation cohort had higher age, higher proportion of males and higher DBP and SBP compared with those in the validation cohort. Test items within the thyroid function test panel, blood coagulation test panel, full blood count test panel and lipid profile test panel were similar between the derivation cohort and the validation cohort. Most test items in the liver function test panel were generally higher among patients in the derivation cohort compared with those in the validation cohort, except for pre-albumin and alkaline phosphate which were lower in the derivation cohort; and cholinesterase, albumin and globulin which were similar between the derivation cohort and the validation cohort. Most test items in the renal function test panel were generally lower in the derivation cohort compared with those in the derivation cohort, except for eGFR which was higher in the derivation cohort, and pH of the urine sample and urine-specific gravity which were similar between the derivation cohort and the validation cohort.

Model derivation, performance measure and validation
In the derivation dataset, the absolute risk of cardiovascular death within 2 years was 12.05%. Of the 27 candidate predictors (online supplementary table 1), 22 predictors (27 parameters) were statistically significantly associated with cardiovascular death in the final multivariate model (table 2). Table 2 shows the apparent and internal validation performance statistics of the risk prediction model. After adjustment for optimism, the final risk prediction model was able to discriminate patients with DN with and without cardiovascular death with a C-statistic of 0.736 (95% CI 0.731 to 0.740). The agreement between the observed and predicted proportion of cardiovascular hospitalization and re-hospitalization showed good apparent calibration ( figure 1, left). The optimism-adjusted calibration slope was 0.993 (95% CI 0.974 to 1.013) for cardiovascular death (table 3).

external validation
In the external validation cohort, the absolute risks for cardiovascular death was 11.79%. Applying our final risk prediction model to the independent population gave a C-statistic of 0.747 (95% CI 0.737 to 0.756) for cardiovascular death and good calibration ( figure 1, right), with the calibration slope 1.000 (95% CI 0.981 to 1.020) for cardiovascular deaths.
A substantially underestimated cardiovascular risk was observed in the external validation cohort when applying the UKPDS risk score for 2-year risk both with (highest risk) (figure 2, right) and without (lowest risk) (figure 2, left) smoking information. Figure 3 gives a clinical example of the application of prognostic score charts with graphical illustrations for the cardiovascular death prediction model to predict 2-year risk of cardiovascular deaths.

Main findings
A new risk prediction algorithm has been developed in this study to quantify the absolute risk of cardiovascular death within 2 years in a prospective cohort of Chinese patients with incident diagnosis of DN. The prediction model was then externally validated in another independent prospective cohort. The risk prediction model demonstrated useful discrimination and excellent calibration, with C-statistics of >0.70 both in the derivation cohort and external validation cohort. The risk prediction model was derived from clinical measurements routinely recorded and accessible in diabetes care settings (outpatients and inpatients), indicating that these can be readily applied in routine diabetes care (eg, by embedding in medical administrable software).
Comparison with previous studies van der Sande et al developed a prediction model to predict cardiovascular events within 3 years among patients with prevalent DN treated with angiotensin receptor blockers. 18 Age, gender, smoking, SBP, urinary albumin/creatinine ratio, eGFR, albumin and phosphate were included as predictors. However, the model performed poorly and yielded a C-statistic of 0.61 (95% CI 0.59 to 0.64) with a general slope calibration >1.00 (ie, the model overpredicted risk).
Previous risk prediction models have not fully addressed cardiovascular disease itself as the primary reason for death in patients with newly diagnosed DN. Being aware of the absolute risk of cardiovascular death in the following 2 years could help clinicians in their discussions with patients, and the urgency and intensity with which they provide cardiovascular event preventative care to patients with a high-risk profile, and could lead to a reduction in health cost overall. Implementation could be tested using a randomized controlled trial, with health economic assessment, and could include embedding alerts into practice software and increasing patient awareness of their risk.
Comparison with other risk score The UKPDS risk score has been developed for patients who have newly diagnosed type 2 diabetes to estimate their 1-10 years CVD risk. The algorithm performs well for the general population with type 2 diabetes. 16 We applied the UKPDS risk score in the external validation cohort and found it substantially underestimated the 2-year CVD risk predicted. This might be due to levels of the UKPDS predictors (age, sex, glucose, SBP and lipid levels) being different in the population with DN compared with the wider type 2 diabetes population. 3 This suggests that using the UKPDS risk engine to guide clinical management might not be suitable among patients with type 2 diabetes and DN. Our new risk score would be a more specific risk calculator for such patients.

strengths and limitations
There are several advantages in our prediction model over those applied elsewhere. The risk score is on the basis of absolute risk derivation and validation in two prospective cohorts. Demographic and clinical measurements routinely recorded both in outpatient and inpatient settings were used to derive the prediction model. This indicates that it can be readily embedded into online tools for their application in outpatient or inpatient settings. Furthermore, compared with patients with type 2 diabetes, those with DN are on a fast-track to progress to a cardiovascular event, including premature cardiovascular death. 1 3 The identification of individuals at high risk of cardiovascular deaths in the short term could help clinicians to prioritize new therapy (such as Sodium Glucose Co-Transporter-2 (SGLT-2) inhibitors) to delay a fatal (or non-fatal) cardiovascular event.
The approaches used to develop and validate the present model are similar to those for other risk prediction models derived from the Clinical Practice Research Datalink (CPRD) and QResearch studies. 19 20 The predictors in our final model are accurate and reliable clinical variables routinely recorded in outpatient and inpatient settings and updated and reviewed for patients with DN having type 2 diabetes and are less varied than in other datasets. Moreover, the volume of missing values was relatively low, which would be less likely to lead to variation in potential external applications, although we applied multiple imputation. Caution is needed in interpreting the association between these predictors and the outcome, as the multiple imputation used might introduce information bias as the proportion of missing data was high with some predictors. This is likely to be less important as the aim of this study was to develop a risk score rather than investigating the causal association between exposure (predictor) and the outcome: multiple imputation of predictors is a good approach for model derivation to improve prediction accuracy. 21 In this study, the definition of an incident clinical diagnosis of DN was based on existing medical records. Naturally, the timing of the actual onset of DN would be unknown (as with many other non-communicable diseases), and the inclusion of prevalent cases would therefore be possible. However, renal function tests (ie, creatinine) were used routinely among patients with type 2 diabetes to prospectively screen for chronic kidney disease and the possibility of enrolling prevalent cases was low. Restricted by current sample size and to facilitate utilization of the risk score, only polynomial terms of traditionally well-known prognostic factors (age, BMI, SBP, DBP, glucose, HbA1c, total cholesterol, LDL, HDL and triglycerides) were tested; polynomial terms of each laboratory items were not tested in the current Clinical care education/Nutrition study. Further studies with large sample size and more polynomial terms of laboratory items are warranted. We acknowledge that antidiabetes treatments, diabetes duration, history of cardiovascular disease, antihypertensive treatment, lifestyle risk factors (like smoking) and other comorbidities were not taken into account as a result of limitations in the original data. However, some of these prognostic factors are very common in people with diabetes (such as antihypertensive treatment which is used in 81.2% of patients with type 2 diabetes 22 ), and as a result, would be less discriminatory in the model. We also believe that at least some of the clinical measurements incorporated in the prediction model could serve as proxies for these inaccessible predictors. Due to the sample size of the derivation and validation cohorts, further independent external validations (eg, external data from other low-income and middle-income countries) with large sample size are warranted. As the risk score was derived and externally validated in Chinese population with type 2 diabetes and incident DN, further validations in other ethnic groups are warranted. Multicollinearity could exist between predictors in this study. However, instead to quantifying a causal association, the goal of this study was to make a prediction that would be less likely to be influenced by multicollinearity. 23

COnClusiOns
In conclusion, this is the first study to derive a prediction model to quantify the 2-year absolute risk of cardiovascular death among patients with type 2 diabetes and newly diagnosed DN. Our risk algorithm has three useful implications for DN practice. First, the risk score can be used as a screening tool to identify patients with high probability of cardiovascular death. The risk score is based on readily accessible clinical information routinely recorded either in outpatient setting or inpatient setting and evaluated by diabetes management teams. It can be readily embedded into heath administration computer systems or developed into a mobile application for a handheld device for ease of use. Second, the risk prediction score could be applied to establish new treatment thresholds in diabetes clinical practice through consensus development of guidelines. Third, this new risk score is a more specific risk calculator for DN compared with the UKPDS risk score.