Article Text

## Footnotes

Contributors MLA had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept: MLV, TJH, EWG, PZ. Study design: MLV, TJH, EWG, PZ. Acquisition of data: TJH. Drafting of the manuscript: MLA. Critical revision of the manuscript for important intellectual content: TJH, EWG, PZ. Statistical analysis: MLA. Interpretation of data: MLA, TJH, EWG, PZ. Review and approval of the manuscript: TJH, EWG, PZ, MLA.

Funding This research was supported by Contract Number 20072008727958, Task Order 40 from the Centers for Disease Control and Prevention (CDC) and by RTI International. The opinions in this paper are solely those of the authors and do not necessarily reflect the opinions of CDC or RTI.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement No additional data available.

## Statistics from Altmetric.com

### Significance of this study

#### What is already known about this subject?

Most diabetes type 2 risk equations in the literature focus on predictions for people older than 45 years of age.

The literature finds good levels of discrimination but high levels of discrimination often arise when studies are not able to exclude at baseline individuals with undiagnosed diabetes from the estimating sample.

#### What are the new findings?

One size does not fit all when we want to identify what are the most important risk factors for type 2 diabetes across different age groups. Relative and absolute risks vary by age.

The predictive capacity of equations based on biomarkers is, on average, better than those based on self-reported variables but information from biomarkers are more important in older populations than in younger ones. We find no significant difference in the area under the receiver-operating curve between simple and enhanced equations in young adults.

#### How might these results change the focus of research or clinical practice?

A screening strategy based on self-reported variables in younger populations would be as effective as one that requires collecting clinical samples. For older populations, there is a tradeoff between a simple model that can be applied to more people, and an enhanced model that would be more accurate, but would require costly laboratory tests.

## Introduction

Several risk equations have been developed to identify those at high risk of developing type 2 diabetes1 2 using data from the Framingham Heart Study, the National Health and Nutrition Examination Survey (NHANES), Coronary Artery Risk Development in Young Adults (CARDIA), Atherosclerosis Risk in Communities (ARIC), the Cardiovascular Heart Study (CHS), and the San Antonio Heart Study for various follow-up periods (from 5 to 24 years), each using different estimation methods. The variables used to generate predicted probabilities are common across many of these studies and include self-reported demographic information, such as age, sex, race, medication use, family history of diabetes, body mass index (BMI) and/or waist circumference, smoking status, alcohol consumption, and food consumption. Enhanced risk equations also use clinical measures, such as systolic blood pressure, fasting plasma glucose (FPG), triglyceride, and high-density lipoprotein (HDL) cholesterol levels.

The development of simple yet accurate risk scores is important for risk stratification and prevention by clinical and public health interventions. Similarly, quantifying the absolute and relative risks for diabetes associated with combinations of key risk factors is essential for cost-effectiveness modeling efforts. However, cohort studies in the USA have generally been limited to specific segments of the population age range. The most important sets of risk factors, as well as the relative and absolute risks, may vary considerably by age.

In this analysis, we assembled data from three major US epidemiological studies to develop diabetes risk equations and to estimate separate, age-specific risk equations. Our objectives were to (1) examine whether core risk factors and risk equations vary with age and (2) quantify the performance of simple risk equations, based on self-reported variables (age, sex, race, BMI, smoking status, family history, and binary indicators for high blood pressure and high cholesterol), and enhanced risk equations, which include added clinical variables (blood pressure, cholesterol, FPG, HDL, and triglycerides).

## Research design and methods

### Data

Study data were obtained from three epidemiological studies: CARDIA, ARIC, and CHS. The CARDIA study, initiated in 1985 to investigate lifestyle and other factors that influence the evolution of coronary heart disease (CHD) risk factors during young adulthood, recruited 5116 black and white women and men, aged 18–30 years, in four urban areas: Birmingham, Alabama; Chicago, Illinois; Minneapolis, Minnesota; and Oakland, California.3 Participants were followed for 20 years.

ARIC, initiated in 1987, was conducted in four communities (Washington County, Maryland; Forsyth County, North Carolina; Jackson, Mississippi; and Minneapolis, Minnesota) by randomly selecting a cohort of 15 792 individuals aged 45–64 years. Participants were followed for 9 years. ARIC was designed to investigate the causes of atherosclerosis and its clinical outcomes, and the variation in cardiovascular risk factors, medical care, and disease by race and gender.

The CHS, initiated in 1989, enrolled 5888 men and women aged 65 years and older in four communities: Forsyth County, North Carolina; Sacramento County, California; Washington County, Maryland; and Pittsburgh, Pennsylvania. Eligible participants were sampled from Medicare eligibility lists in each area, and were followed for 7 years. The main objective of the CHS study was to identify factors related to the onset and course of CHD and stroke.

### Analysis samples and diabetes variable

Our dependent variable was the first incident/diagnosis of type two diabetes. Therefore, we excluded participants with a previous diagnosis of type two diabetes at the time of enrollment. Diabetes status was assessed using slightly different methods and at different follow-up intervals in the three studies. In CARDIA, diabetes status was defined by FPG measurements and/or self-report of taking oral diabetes medications with or without insulin injections. In ARIC, diabetes status, based on self-reporting of previous diagnosis and glucose values, was available at triennial follow-up encounters (year 3: 1990–1992; year 6: 1993–1995; year 9: 1996–1998). In years 3 and 6, self-reported diabetes medication use was collected; in year 9, a 2-hour oral glucose tolerance test (OGTT) was also administered. In CHS, diabetes was defined annually by the new use of insulin or oral hypoglycemic medication and/or by FPG values. To create a uniform definition of diabetes across all datasets, we defined diabetes by reported physician diagnosis and/or FPG ≥7.0 mmol/L (126 mg/dL).

None of the three datasets included adults aged 31–44 years; we therefore created a fourth date set by splitting CARDIA into two samples. Individuals recruited for CARDIA at age 18–30 years were aged 28–40 years at the 10-year follow-up. Thus, our new data set, which we will refer to as CARDIA-10, included CARDIA participants who had not developed diabetes by year 10 as a new sample baseline.

### Statistical analysis

To address the question of whether a particular set of variables would predict equally effectively across different age groups, we estimated simple and enhanced risk equations for four age groups (18–30, 28–40, 45–64, and 65 and older) using the same sets of variables (self-reported variables for the simple equation and self-reported plus clinical data for the enhanced equation), and the same statistical method (logistic regression) to isolate the predictive power of individual coefficients for the likelihood of diabetes for each age group. The outcome variable in our models was the cumulative incidence of diabetes throughout the observational period for each sample (10 years in CARDIA and CARDIA-10, 9 years in ARIC, and 7 years in CHS).

To test and compare the performance of the simple and enhanced models, we randomly selected 70% of each sample to develop the models, and used the remaining 30% of individuals for validation. This bootstrap exercise was repeated 1000 times for each of the four datasets to strengthen the validity and generalizability of our findings. We evaluated the diagnostic properties of the simple and enhanced models across the four datasets on the remaining 30% of the sample. Predictive capacity for each continuous factor was assessed using the area under the receiver-operating curve (AUROC). A model with no predictive power has an area equal to 0.5, whereas a perfect model has an area equal to 1. We also assessed sensitivity, specificity, and positive and negative predictive values (PPV and NPV). Because the risk equations were estimated over different time periods for the different datasets, we computed 1-year probabilities to make the results more comparable.

To explore the consequences of using coefficients from one model to estimate the probability of developing diabetes for individuals from another dataset and thus a different age group (eg, using the risk equation estimated with ARIC data for those aged 45–64 years to estimate risk in the CHS population aged 65 years and older), we applied coefficients from one risk equation to data for different age groups. We did so using a constrained logistic regression where the intercept was allowed to vary (accounting for differences in the absolute risk across age groups) but the coefficients for the other variables were constrained to be equal to the coefficients from the original risk equation (maintaining the same OR as the original equation). Thus, we applied the ARIC equation to the CARDIA, CARDIA-10, CHS data, and so on. By allowing the intercept to be re-estimated, we controlled for differences in the absolute probability of diabetes across age groups, but the constrained coefficients on the variables maintained the OR for the original risk equation.

## Results

The initial CARDIA sample consisted of 5116 individuals aged 18–30 years. We excluded 78 with diabetes at baseline and 999 with incomplete data, leaving an analytic sample of 4039. At year 20, 3413 remained in the CARDIA sample, which represents our potential sample for the CARDIA-10 cohort. At years 10 and 20, we had information on self-reported diabetes status and FPG. We excluded those who had diabetes at year 10 (n=266) and used the covariates measured at year 10 as the baseline year. We excluded participants with incomplete information on the variables of interest at year 10 (n=274). The final CARDIA-10 sample used for estimation was 2873 people aged 28–40 years at baseline (ie, year 10 of CARDIA).

In ARIC, 15 792 individuals, aged 45–64 years were recruited at baseline. We used baseline explanatory variables as predictors of the cumulative incidence rate of diabetes at year 9. We excluded 1163 individuals with diabetes at baseline, 2080 with missing data for the explanatory variables at baseline, and 3674 with missing data for the dependent variable at year 9 follow-up, leaving a final sample of 8875 individuals.

Our starting CHS sample was 5888 individuals aged 65 years or older. We used 7-year follow-up for the purpose of this analysis because that is the latest data in which laboratory values were included in the public use dataset. By year 7, the sample consisted of 4100 participants with laboratory information on FPG. We excluded 501 persons with diabetes at baseline. Our final dataset, excluding individuals without a complete set of covariates, was 3094.

Table 1 shows the baseline characteristics of study participants. Data are presented as means and SD, unless otherwise noted.

Tables 2 and 3 show the results of the simple and enhanced models. BMI was the only variable with significant predictive power in both simple and enhanced equations across all age groups. Parental history had predictive power for younger cohorts in the simple and enhanced models, but less predictive importance for the oldest cohort. In the enhanced models, FPG and triglycerides had the best predictive power (see online supplementary appendix A-1).

### Supplementary Appendix

As expected, we found that AUROCs were higher for the enhanced models than for the simple models (tables 2 and 3). The simple and enhanced models show different levels of predictive power across age groups. AUROCs for simple risk equations were 0.72 for CARDIA; 0.79 for CARDIA-10; 0.75 for ARIC; and 0.69 for CHS. However, AUROCs for enhanced equations were 0.75 for CARDIA; 0.85 for CARDIA-10; 0.85 for ARIC; and 0.81 for CHS. Statistically significant differences between simple and enhanced models were present only for older age cohorts (ARIC and CHS).

Online supplementary appendix A-2 shows the performance of the simple and enhanced equations in identifying people that developed diabetes by quintiles of predicted risk thresholds using the split sample approach as a test of internal validity. At the top quintile, few individuals would be above the predicted value cut-off, and thus the sensitivity is low and specificity is high. PPV and NPV depend on the prevalence of diabetes in each sample. In younger cohorts, where the prevalence of diabetes is relatively low, PPV is low and NPV is high. In older cohorts, as the prevalence increase so does the PPV.

Table 4 shows the extent to which the coefficients in each of the models change predictions in the other datasets, by testing what happens when the equation estimated with one dataset was used to estimate the probability of developing diabetes for individuals of an age group from another dataset. We report constrained regressions, forcing all coefficients excluding the constant term to be the same as in the source equations, and thus allowing for the calibration of the constant term. We report the AUROC and forecasted 1-year probabilities. All constant terms (analogous to the baseline hazard rate) in the target and source datasets are statistically different. The further away, in terms of age, the underlying cohort is from the coefficients of the risk model used for constrained regression in the test dataset, the more imprecise the results are compared with the results originating from the same underlying coefficients and data. Irrespective of whether the calibration is done using the simple or the enhanced model, equations based on younger age groups when applied to older cohorts underpredict diabetes incidence. Conversely, risk equations based on older age cohorts overpredict the likelihood of diabetes in younger cohorts.

For each dataset, we show the corresponding predicted probability of developing diabetes, and compare these with the probability of developing diabetes using the equation in which they were developed (ie, using the target data). The graphs in online supplementary appendix A-3 illustrate how one can overpredict or underpredict using the wrong equation, by plotting the information presented in table 4 across the entire distribution. Calibration of the constant term across cohorts helps resolve some of the unaccounted discrepancies across cohorts. Even with calibration however, older cohorts typically overpredict, while younger cohorts underpredict, irrespective of the set of covariates used.

## Discussion

Our goal was to generate simple and enhanced age group-specific risk equations to predict the probability of developing type two diabetes, and to determine the extent to which patient characteristics matter differently across age groups. Often, risk factors are selected from many potential covariates based on the strength of association with the outcomes in a study sample. This study shows which variables matter in predicting the risk of diabetes and how their importance varies depending on age. Based on the rules by Hosmer and Lemeshow4 for interpreting AUROC values, we find that simple equations have an acceptable level of discrimination (0.7≤AUROC<0.8), while enhanced equations have very good discrimination (0.8≤AUROC) except for the youngest (18–30) age group, which is in the acceptable range. Overall, we find that risk equations have better predictability in middle-aged adults than in young and older populations. Thus, it is not surprising that most risk equations published in the literature focus on predictions for people older than 45 years.

Our study shows that predictions vary markedly and significantly when coefficients derived from one age group are used to predict non-adjacent age groups. This suggests that the covariates have different predictive power of future risk of diabetes for different age groups. For example, while risk increases with age, age has a lower predictive power in older cohorts than in younger cohorts. Race, sex, and parental history are stronger predictors for younger age groups. Younger males are significantly less likely to develop diabetes than younger women, while this relationship does not hold true for older men and older women. This initial difference may be driven by the risk of gestational diabetes among women. BMI is the most consistent statistically significant indicator for diabetes across age groups and for both simple and enhanced equations. However, BMI matters more in the simple model than in the enhanced model. On average, a one unit increase in BMI increases the distal probability of diabetes by 10% across studies (see online supplementary appendix figure A-1). Biomarkers have much narrower confidence intervals. SBP and HDL matter only for the middle age group. Triglycerides and FPG are statistically significant across all age groups, but they matter marginally more for older cohorts than for younger ones in correctly predicting the likelihood of diabetes.

Online supplementary appendix table A-4 summarizes results from 19 previous studies Fourteen of these studies use logit models, and five use a proportional hazard model. All studies measuring current prevalence used non-clinical data only, while studies measuring future risk tended to include information on biomarkers. Approximately half of the studies found an acceptable level of discrimination (0.7≤AUROC<0.8), while the remaining reported very good discrimination (0.8≤AUROC). FINDRISC5 and the Diabetes Risk Calculator6 had the highest combined sensitivity and specificity, with AUROCs of 0.86 and 0.85, respectively. Cabrera de León *et al*
7 achieved an AUROC of 0.84 for men and 0.87 for women. The high values in the discrimination ability of the Diabetes Risk Calculator and Cabrera de León *et al* are likely due to sample selection; these studies included individuals with undiagnosed diabetes and prediabetes. Previous models in the literature using the same data sources as our study (CARDIA,8 ARIC,9 10 and CHS11) achieved comparable sensitivity and specificity to ours, despite these previous models including additional variables (physical activity and diet) that we did not include because they were not uniformly coded across studies.

Enhanced risk equations provide better discrimination than simple risk equations, but the benefit of enhanced equations is less in younger cohorts, and there was no significant difference in the AUROC between simple and enhanced in young adults. This implies that screening strategy based on sex, family history, race, and BMI in younger populations would be nearly as effective as one that requires collecting clinical samples. For older cohorts, there is a tradeoff between a simple model that could be used by more people, and an enhanced risk equation that would be more accurate, but would require costly laboratory tests. It is important to note that the simple and enhanced models did not differ significantly in terms of cumulative predictions; therefore, at the population level, a less expensive model performs as well as the more costly model. At the individual level, however, the costlier model will significantly increase the sensitivity of the estimates.

Four limitations related to the data are important to highlight. First, all three surveys experienced loss during follow-up. Individuals exited the sample as a result of death, relocation, or loss of interest in the study. CARDIA had a follow-up rate of 80%; CARDIA-10 84%, ARIC 75%, and CHS 61%. Loss to follow-up could bias estimates, if it is correlated with the likelihood of having diabetes and individual characteristics. Second, the surveys do not define diabetes through OGTT, but through self-reported questionnaires and FPG; however, this is also a benefit as it more closely reflects common practice. Third, because the surveys used are not nationally representative, it is possible that the differences we attributed to age reflect, in part, geographical variations. Fourth, the surveys began in the 1980s and 1990s, and may not reflect current population characteristics and treatment approaches. However, they may reflect the underlying natural history of diabetes progression in the absence of formal interventions to prevent diabetes.

In summary, we found that risk equations have better predictability in middle-aged adults than in young and old populations. While the predictive capacity of equations based on biomarkers is, on average, better than those based solely on self-reported variables, information from biomarkers are more reliable and important in older populations than in younger ones. This variability emphasizes the importance of using age-specific risk equations when assessing the need to screen for type two diabetes to improve accuracy of individual-level predictions. Using age-specific risk equations may be especially important for the development of practical risk stratification tools, as well as to provide more precise parameters for cost-effectiveness analyses.

## References

## Footnotes

Contributors MLA had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. Study concept: MLV, TJH, EWG, PZ. Study design: MLV, TJH, EWG, PZ. Acquisition of data: TJH. Drafting of the manuscript: MLA. Critical revision of the manuscript for important intellectual content: TJH, EWG, PZ. Statistical analysis: MLA. Interpretation of data: MLA, TJH, EWG, PZ. Review and approval of the manuscript: TJH, EWG, PZ, MLA.

Funding This research was supported by Contract Number 20072008727958, Task Order 40 from the Centers for Disease Control and Prevention (CDC) and by RTI International. The opinions in this paper are solely those of the authors and do not necessarily reflect the opinions of CDC or RTI.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Data sharing statement No additional data available.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.