Development and validation of a clinical prediction rule for development of diabetic foot ulceration: an analysis of data from five cohort studies

Introduction The aim of the study was to develop and validate a clinical prediction rule (CPR) for foot ulceration in people with diabetes. Research design and methods Development of a CPR using individual participant data from four international cohort studies identified by systematic review, with validation in a fifth study. Development cohorts were from primary and secondary care foot clinics in Europe and the USA (n=8255, adults over 18 years old, with diabetes, ulcer free at recruitment). Using data from monofilament testing, presence/absence of pulses, and participant history of previous ulcer and/or amputation, we developed a simple CPR to predict who will develop a foot ulcer within 2 years of initial assessment and validated it in a fifth study (n=3324). The CPR’s performance was assessed with C-statistics, calibration slopes, calibration-in-the-large, and a net benefit analysis. Results CPR scores of 0, 1, 2, 3, and 4 had a risk of ulcer within 2 years of 2.4% (95% CI 1.5% to 3.9%), 6.0% (95% CI 3.5% to 9.5%), 14.0% (95% CI 8.5% to 21.3%), 29.2% (95% CI 19.2% to 41.0%), and 51.1% (95% CI 37.9% to 64.1%), respectively. In the validation dataset, calibration-in-the-large was −0.374 (95% CI −0.561 to −0.187) and calibration slope 1.139 (95% CI 0.994 to 1.283). The C-statistic was 0.829 (95% CI 0.790 to 0.868). The net benefit analysis suggested that people with a CPR score of 1 or more (risk of ulceration 6.0% or more) should be referred for treatment. Conclusion The clinical prediction rule is simple, using routinely obtained data, and could help prevent foot ulcers by redirecting care to patients with scores of 1 or above. It has been validated in a community setting, and requires further validation in secondary care settings.

Age was not a statistically significant predictor in either the PODUS analyses or in the tenth study. Increasing duration of diabetes increased the odds of foot ulcer in our analyses, but decreased the odds in the tenth dataset. Non-linearity in relationships was checked as an explanation of these results, but no evidence was found of any non-linearity. [12] It turned out that the tenth dataset had very few women (<2%) and so could not be used to confirm sex as a predictor.

Sample size
There was no sample size calculation as the datasets were pre-existing and the analysis simply used all the data available. However, recent sample size guidance [13,14] was checked, and the size of the dataset and number of outcomes were adequate for the analyses.
The development datasets had 8255 participants, who had 430 ulcer outcomes, which gave 143 events per predictor parameter. Assuming a conservative model performance (Cox-Snell R 2 of 15%), this exceeded the recommended minimum sample size for model development. [14] In the validation dataset, 295 participants were removed from the analysis as they had already contributed data to one of the development datasets. [2] This reduced the validation dataset from 3707 to 3412. The validation dataset had 128 ulcer outcomes, again exceeding the recommendation of at least 100 events and 100 non-events to validate model performance in an external dataset. [15] Missing data The percentage of participants that had missing data for predictors or outcome was calculated. Some participants were missing data on previous history of ulceration or amputation. As both ulceration and amputation are important to record, it was judged that missing data on these events meant that the participant had no previous history of ulceration or amputation. Therefore missing data for history was recoded as negative for this predictor. For the other predictors, other methods for dealing with missing data, such as multiple imputation, would have been considered, but as the proportion of missing data in the development studies was <2.4% and in the validation dataset <2.6%, a complete-case analysis was performed.
Participant flowcharts for each study (see supplementary material) show where data were missing and the effect of recoding of missing data for previous history.

Handling competing risk of death
Some participants died in each study before the end of follow-up. In the community-based studies, one death was recorded in the largest dataset over its two year follow-up period, [1] and 59 people died in the dataset with one year follow-up. [2] In the two secondary care studies, one recorded that 13 people had died, [6] but death was not recorded in the other study. [5] Since death was not systematically recorded in all studies, participants were included whether or not it was known they had died, provided they had complete data on the predictors and outcome before their death. The total number of known deaths (73) comprised less than 1% of the analysis dataset. The CPR therefore assumes that people who died before developing a foot ulcer would not have a foot ulcer by two years.
During the two-year follow-up period of the validation dataset, 95 patients died, 2.8% of 3412. We applied the same method as to the development datasets, and included these people in the analyses if they had complete data on predictors and outcome.

Statistical analysis plan
The analysis used a logistic regression model with random effects on the intercept, so that each study could have a different baseline risk of ulcer by two years. However, one of the development studies only had follow-up for one year, not two [2]. This study did not contribute to the overall estimate of baseline risk in the prognostic model, but it was allowed to contribute to the estimates of odds ratios (see supplementary material), which were deemed similar enough at one year or two years to combine (and preferable to simply excluding the study and losing a large number of participants).
After model development, the potential for overfitting was estimated by calculating a heuristic shrinkage factor. Shrinkage estimates close to one suggest that the model's estimates are not optimistic (i.e. overfitting is of little concern), whereas smaller shrinkage estimates suggest that the model's predictions are optimistic and should be shrunk.
When calculating risks for each score of the CPR, population-averaged risk estimates were calculated, which use the random effects distribution of baseline risks rather than one summary estimate of baseline risk, to allow for the data being clustered in studies. Population-averaged estimates are considered to be more generalizable to participants in new studies. [16] Steyerberg's method for developing a clinical prediction rule from a statistical model was used [17]. However, the step where the coefficients of predictors are made smaller to compensate for overfitting was omitted. Overfitting is a named given to the phenomenon where statistical models tend to perform better in the datasets they were derived from than independent datasets. This is a particular problem for small datasets, complex models, or large numbers of predictors. The CPR dataset was large, the model simple, and the number of predictors small, and the extent of overfitting with shrinkage factors was estimated and found it to be negligible. Shrinkage was >0.999 in all cases.
Shrinkage was estimated by: [18] ℎ − ℎ Where overfitting occurs, it is recommended that the coefficients of the model are adjusted by multiplying by the shrinkage factor.
As the development dataset comes from four studies, this was accounted for in the analysis by allowing the individual studies to have different baseline risk of ulcer by two years. However, in one of the studies the length of follow-up was only one year [2], Therefore, the baseline risk of ulcer in this study was lower than in the other studies as the participants had less time to develop an ulcer.
To address this, the study's Principal Investigator attempted to obtain longer-term follow-up data with limited success, and the PODUS 2020 steering committee advised not to use the longer-term follow-up data.
The prediction model's baseline risk of ulcer at two years was estimated with data from the three studies with at least two-year follow-up [1,5,6]. First, a logistic regression model was fitted with study as a predictor in addition to the three clinical predictors to obtain baseline risk estimates for each study. Then a random effects meta-analysis of the three study-specific baseline risk estimates was conducted to obtain an overall baseline risk estimate for the prediction model, giving an estimated risk at two years conditional on the three clinical predictors. The estimates for the three The implementation of Steyerberg's method was: 1. Fit the logistic regression model with monofilament, pulses, previous history of ulcer/amputation, and study as predictors. This gives coefficients showing by how much the log odds changes when monofilaments, pulses, or history change from test-negative to test-positive and estimates of baseline risk for each study. The software used was SAS PROC LOGISTIC (SAS 9.4 www.sas.com) with maximum likelihood estimation. 2. Perform a random-effects meta-analysis of the three estimates from the studies with two years of follow-up to get a single overall estimate of baseline risk. 3. Use this overall estimate and the regression coefficients for the three predictors to calculate the probability of ulcer for each possible predictor combination. There are three binary predictors and therefore eight possible predictor combinations. 4. Multiply and round the coefficients of the predictors to get a CPR scoring scheme, bearing in mind that predictor combinations with similar risk of ulcer should have the same score.

Repeat
Step 1 and Step 2, only using the CPR score instead of monofilaments, pulses, and history. 6. Calculate probability of ulcer for each score using a population average method. [16] The population average method should produce estimates with better calibration in external datasets and generalisability to people recruited to new studies than simply using the CPR logistic regression equation. Note that a flowchart for the Monteiro-Soares study was omitted as there were no missing data. The intercept of -3.81 came from the random effects meta-analysis of study-specific baseline risk of the three studies with two-year follow-up data. Based on this model, predicted risks are:

Risk of bias assessment with PROBAST tool
Repeating the analysis with CPR score gave this equation: Again, overfitting for the model with CPR score as assessed by the shrinkage factor (>0.999) was negligible. This equation was not used directly to calculate the risk of ulcer, but instead the population averaged method of Pavlou et al. [16] Here both approaches give similar results. The Pavlou method uses the distribution of random effects rather than just the point estimate of -3.73 when estimating the risks. The risk of ulceration for each score is given in Table 3 of the manuscript.
Calibration in the validation dataset of the prognostic model and CPR score

Net benefit
The potential clinical utility of the CPR was assessed with a net benefit analysis. At a risk threshold of 6% the net benefit is 0 for treat none, and < 0 for treat all, but 0.015 for using the CPR. This can be interpreted as the decision was to treat patients with CPR scores of 1 and above, then 15 additional cases of ulcer at 2 years would be correctly identified for treatment by the CPR, without increasing the number treated unnecessarily, per 1000 individuals. At a risk threshold of 14%, the number of additional cases of ulcer at 2 years correctly identified for treatment would be 10 per 1000 individuals. See decision curves in Figure S6. Figure S6 Net benefit plot with decision curves for "treat none", "treat all", and "treat according to CPR score".