Epidemiology/Health Services Research

Can earlier biomarker measurements explain a treatment effect on diabetes incidence? A robust comparison of five surrogate markers

Abstract

Introduction We measured and compared five individual surrogate markers—change from baseline to 1 year after randomization in hemoglobin A1c (HbA1c), fasting glucose, 2-hour postchallenge glucose, triglyceride–glucose index (TyG) index, and homeostatic model assessment of insulin resistance (HOMA-IR)—in terms of their ability to explain a treatment effect on reducing the risk of type 2 diabetes mellitus at 2, 3, and 4 years after treatment initiation.

Research design and methods Study participants were from the Diabetes Prevention Program study, randomly assigned to either a lifestyle intervention (n=1023) or placebo (n=1030). The surrogate markers were measured at baseline and 1 year, and diabetes incidence was examined at 2, 3, and 4 years postrandomization. Surrogacy was evaluated using a robust model-free estimate of the proportion of treatment effect explained (PTE) by the surrogate marker.

Results Across all time points, change in fasting glucose and HOMA-IR explained higher proportions of the treatment effect than 2-hour glucose, TyG index, or HbA1c. For example, at 2 years, glucose explained the highest (80.1%) proportion of the treatment effect, followed by HOMA-IR (77.7%), 2-hour glucose (76.2%), and HbA1c (74.6%); the TyG index explained the smallest (70.3%) proportion.

Conclusions These data suggest that, of the five examined surrogate markers, glucose and HOMA-IR were the superior surrogate markers in terms of PTE, compared with 2-hour glucose, HbA1c, and TyG index.

What is already known on this topic

  • In the US Food and Drug Administration’s current table of surrogate markers/endpoints that have been used as the basis for drug approval or licensure, the only surrogate marker listed for type 2 diabetes is serum hemoglobin A1c (HbA1c); however, in the clinical literature, it is well known that there exist other potential surrogate markers for diabetes, such as fasting plasma glucose.

What this study adds

  • This study compares five surrogate markers for diabetes—HbA1c, fasting glucose, 2-hour postchallenge glucose, triglyceride–glucose index, and homeostatic model assessment of insulin resistance—in terms of their ability to explain a treatment effect on reducing the risk of type 2 diabetes mellitus at 2, 3, and 4 years after treatment initiation.

  • This study is the first to compare these five surrogate markers using a robust model-free approach.

How this study might affect research, practice or policy

  • These results provide insight into the complex relationship between the change in these markers and diabetes incidence that may be useful in the design and analysis of future studies on diabetes prevention.

Introduction

Clinical studies examining interventions to prevent or delay type 2 diabetes mellitus (diabetes) typically require long follow-up of participants in order to have sufficient power to detect an intervention or treatment effect. In such studies, the availability of a surrogate marker that can be used in place of the primary outcome to test for a treatment effect has the potential to decrease study time and costs, as well as patient burden.

The Accelerated Approval Program of the US Food and Drug Administration (FDA) allows for drugs to be approved based on demonstrated effectiveness on a surrogate marker.1 While this program allows for effective drugs to be made available to patients in need sooner, the requirements for what constitutes a surrogate marker are not clear. The FDA describes a surrogate marker as an intermediate endpoint that measures “a therapeutic effect that is considered reasonably likely to predict the clinical benefit of a drug, such as an effect on irreversible morbidity and mortality.”1 This definition and program more generally has been widely criticized for potentially allowing drugs to be approved and marketed without demonstrated effectiveness on the primary clinical outcome.2 A recent controversial example of this is the approval of aducanumab (Aduhelm) for Alzheimer’s disease, which showed effectiveness with respect to reducing amyloid plaques in the brain even though the theory linking amyloid clearance with slowing cognitive and functional decline is not completely clear.3

In the FDA’s current table of surrogate markers/endpoints that have been used as the basis for drug approval or licensure, the only surrogate marker listed for type 2 diabetes is serum hemoglobin A1c (HbA1c).4 However, in the clinical literature, it is well known that there exist other potential surrogate markers for diabetes, such as fasting plasma glucose.5 6 More recent surrogate markers include the triglyceride–glucose index (TyG) and the homeostatic model assessment of insulin resistance (HOMA-IR), an indirect marker of insulin resistance which has been widely used in practice, both of which have been shown to be predictive of diabetes incidence.7–9

The statistical validation of surrogate markers is complex and there is currently no agreement in the statistical literature on a single optimal way to validate a surrogate.10 The most commonly used measure of surrogate validity in practice is the proportion of the treatment effect on the primary outcome that is explained by the surrogate marker (proportion of treatment effect explained, PTE).11–13 While many other statistical measures have been proposed to evaluate surrogates, the single number summary of the PTE as well as its intuitive interpretation have led to the widespread use and acceptance of the PTE as a measure of surrogate strength.14–18 For example, since the PTE is a proportion, a value close to 1.0 reflects a surrogate that can explain almost all of the treatment effects on the primary outcome, while a value close to 0 reflects a surrogate that essentially cannot explain any of the treatment effect. Although there is no agreed-upon single threshold for what constitutes a “good” surrogate marker, previous work has proposed that PTE estimates above 0.90 and/or PTE estimates with a 95% CI whose lower bound is greater than 0.50 or 0.75 are indicative of a reasonably strong surrogate marker.19 A high PTE ensures that evaluating the treatment effect on the surrogate marker is likely to have a satisfactory statistical power and provide a good bound approximating the treatment effect on the primary outcome. A major criticism of the PTE is that its statistical inference relies on correct parametric model specification (eg, linear regressions) which must accurately capture the complex relationship between the treatment, the surrogate marker, and the primary outcome.19–21 Fortunately, recent work has developed a non-parametric model-free estimation approach to PTE that does not require any parametric specification and has been shown to perform well when the true relationship between these variables is complex.21–23

In this study, we construct a non-parametric estimate of the PTE using new statistical techniques without restrictive model assumptions. The model-free PTE estimate allows us to robustly measure and compare five surrogate markers (HbA1c, fasting glucose, 2-hour postchallenge glucose, TyG, and HOMA-IR) for diabetes diagnosis among patients at high risk for diabetes using secondary data from the Diabetes Prevention Program (DPP).

Subjects, materials, and methods

DPP and participants

The DPP was a randomized, double-blind, placebo-controlled clinical trial designed to test interventions to prevent or delay the development of type 2 diabetes in high-risk adults with impaired glucose tolerance. Participants were randomly assigned to four groups: placebo, lifestyle intervention, metformin (850 mg two times per day), and troglitazone. Detailed study design and primary study results are reported elsewhere.24 25 This secondary data analysis is based on publicly available data through the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository.26 Only clinics with IRB approval to distribute their data to the repository are included in the data release; this constitutes 3665 participants out of the original 3819 DPP participants. In this study, we specifically examined the treatment effect of the lifestyle intervention compared with the placebo group. One participant missing HbA1c at baseline was excluded from the analysis. Our final analytic data set was composed of 1023 participants assigned to the lifestyle intervention and 1030 participants assigned to placebo.

Primary outcome

The primary outcome in the DPP study was time to a type 2 diabetes diagnosis, which was defined by the DPP protocol based on either fasting glucose or 2-hour postchallenge glucose: fasting glucose ≥140 mg/dL or 2-hour postchallenge glucose ≥200 mg/dL (for visits through June 23, 1997), and fasting glucose ≥126 mg/dL or 2-hour postchallenge glucose ≥200 mg/dL (for visits from June 24, 1997 through April 1, 2000). Fasting glucose was measured at annual and midyear visits, while an oral glucose tolerance test was completed only at annual visits. A participant with an elevated glucose level at either an annual visit or a midyear visit would then have a follow-up confirmation visit within 6 weeks, and if the same glucose level was still elevated the participants would be diagnosed as having achieved the primary outcome of type 2 diabetes.

Surrogate markers

Five surrogate markers were examined: HbA1c, fasting glucose, 2-hour postchallenge glucose, TyG index, and the homeostatic model assessment for insulin resistance (HOMA-IR). The TyG index was calculated as: ln [fasting serum triglyceride (mg/dL)×FPG (mg/dL)/2].9 The HOMA-IR level was calculated as: [fasting serum insulin (μU/mL)×FPG (mg/dL)/405].8 The change in each surrogate marker was defined as the change in the biomarker from baseline to 1-year postrandomization.

Statistical methods

The cumulative incidence by group was examined using the Kaplan-Meier estimate.27 Patients who died within 4 years from randomization (n=9) were censored at the time of death. The distributions of the change in each surrogate from randomization to 1 year among participants still under observation at 1 year were descriptively examined using stratified boxplots. Each surrogate marker was evaluated by calculating the proportion of the treatment effect on the primary outcome that was explained (PTE) by the treatment effect on the surrogate information at 1-year postrandomization. The treatment effect on the primary outcome was quantified as the difference in diabetes cumulative incidence at a specific time, t, where t=2 years, 3 years, and 4 years were examined. That is, the treatment effect at time t, denoted as  Inline Formula , was equal to  Inline Formula , where  Inline Formula  is the time from baseline to diabetes diagnosis under intervention R; this is illustrated in figure 1, where  Inline Formula  is the distance between the two treatment group curves at each specific time t. The PTE is defined as the ratio between the portion of the treatment effect that is explained by the surrogate versus the total treatment effect on the primary outcome. Specifically,  Inline Formula , where  Inline Formula  denotes the residual treatment effect on the primary outcome after the treatment effect on the surrogate marker is removed.13 This is parallel to the direct and indirect effects framework in causal inference.14 Both  Inline Formula  and  Inline Formula  are estimated non-parametrically using robust kernel-smoothing methods that do not require any parametric modeling.22 Perturbation resampling, a variant of bootstrap, was used to construct 95% CIs.28 29

Figure 1
Figure 1

Cumulative incidence by treatment group, with dashed lines at t=2 years, t=3 years, and t=4 years; the treatment effect at time t, , is equal to the distance between the two treatment group curves at each specific time t.

Importantly, this statistical approach evaluates a surrogate marker by considering the surrogate marker information at 1 year to be a combination of (1) the 1-year change in the surrogate marker for those who had not yet been diagnosed with diabetes and (2) diabetes incidence if diabetes was diagnosed before 1 year. Alternative methods that either ignore diabetes incidence before 1 year or evaluate the surrogacy only among those who were not diagnosed with diabetes before 1 year are not appropriate. Our approach incorporates all information about diabetes incidence available at the time of surrogate marker measurement in order to quantify the overall strength of the surrogate information at 1 year with respect to predicting the treatment effect at time t. This approach also allows us to calculate the incremental value (IV) of the surrogate marker alone, that is, providing an estimate of the proportion of the total treatment effect on the primary endpoint explained by only the surrogate marker.22 For each surrogate marker, we calculate (1) the overall PTE of the surrogate information and (2) the IV of the marker measurement alone.

Statistical analyses were performed using R V.4.2.1 and using the R package Rsurrogate.30 Throughout, statistical inference is based on 95% CIs rather than p values.

Data and resource availability

The DPP study data used in this analysis are publicly available through the NIDDK Central Repository (https://repository.niddk.nih.gov/studies/dpp/) upon establishment of data use agreement.

Results

Overall, 67.4% of the participants were female, half were 50 or older, 38% had a body mass index of 35 or higher, and the majority (56.8%) were white (table 1). At baseline, the participants had a mean HbA1c of 5.9, a mean fasting glucose of 107.2 mg/dL, a mean 2-hour glucose of 164.5, a mean TyG index of 8.9, and a mean HOMA-IR of 7.0. Participant characteristics were similar across the treatment groups, which is expected given randomization. Descriptively, the distribution of the change in each of the five markers differed by treatment group (figure 2).

Figure 2
Figure 2

Distribution of marker changes from randomization to 1 year by treatment group, among participants still under observation at 1 year. HbA1c, hemoglobin A1c; HOMA-IR, homeostatic model assessment of insulin resistance; TyG, triglyceride–glucose index.

Table 1
|
Participant characteristics overall and by treatment group

The cumulative incidence of diabetes was lower among patients assigned to lifestyle intervention compared with those assigned to placebo, which is consistent with the primary DPP study results (table 1 and figure 1). The estimated treatment effect on diabetes incidence,  Inline Formula , at 2 years was 0.113 (95% CI 0.084, 0.139), at 3 years was 0.149 (95% CI 0.108, 0.189), and at 4 years was 0.158 (95% CI 0.104, 0.217).

Table 2 shows the estimated PTE and IV for each surrogate marker at each examined time point, along with 95% CIs. For all time points, glucose and HOMA-IR explained higher proportions of the treatment effect than 2-hour glucose, TyG, or HbA1c. At 2 years, glucose explained the highest (80.1%) proportion of the treatment effect (PTE=0.801, 95% CI 0.639, 1.005), followed by HOMA-IR (PTE=0.777, 95% CI 0.605, 0.984), 2-hour glucose (PTE=0.762, 95% CI 0.621, 0.934), and HbA1c (PTE=0.746, 95% CI 0.587, 0.940). The TyG index explained the smallest (70.3%) proportion (PTE=0.703, 95% CI 0.567, 0.864). At 3 years, glucose continued to explain the highest proportion (71.2%), but HbA1c explains the smallest (52.9%), with all marker PTE decreasing as the time point is farther from 1 year. At 4 years, HOMA-IR explained the highest (62.3%) proportion of the treatment effect, followed by glucose (61.4%), 2-hour glucose (57.5%), TyG (55.8%), and HbA1c (43.2%).

Table 2
|
Proportion of treatment effect explained and the incremental value of each surrogate marker

Patterns for the IV paralleled the PTE, as expected, with IV estimates for the five markers ranging from 0.069 to 0.124 at 2 years, from 0.049 to 0.232 at 3 years, and from 0.016 to 0.208 at 4 years. The 95% CIs for the IV of fasting glucose, 2-hour glucose, and HOMA-IR for all time points excluded 0, thus providing evidence that the IV of these markers is greater than 0. In contrast, the 95% CIs for the IV of HbA1c included 0 for all time points. For the TyG index, there was evidence of positive IV for t=3 and t=4 years, but not for t=2 years.

Discussion

The results demonstrate that, of the five examined surrogate markers, fasting glucose and HOMA-IR explained the largest proportions of the treatment effect when comparing lifestyle intervention with placebo. The TyG index and HbA1c were the weakest surrogate markers in terms of the proportion of the treatment effect that could be explained. The PTE for all markers decreased for later values of t, which is generally expected, as the treatment effect is generally more difficult to capture as the length of time between the surrogate marker measurement and the treatment effect measurement increases.

While we focus on the surrogate information at 1 year, our approach could similarly be used to examine the PTE of surrogate information at 2 years, when t=3 years or t=4 years. We conducted such an analysis as a sensitivity analysis (results not shown) examining the average changes in the surrogate markers during the first 2 years, investigating the PTE and IV at time points after 2 years only (3 years and 4 years). The general patterns were similar to the main results, where fasting glucose, HOMA-IR, and 2-hour glucose appear to have higher surrogacy than HbA1c and the TyG index.

Previous work examining TyG and HOMA-IR has demonstrated that these markers are independently predictive of diabetes incidence. Park et al8 showed that the TyG index was significantly better than HOMA-IR in terms of the area under the time-dependent receiver operating characteristic curve (AUC). Importantly, while prediction accuracy information is useful, estimates reflecting good prediction accuracy do not necessarily translate to a measurement being a good surrogate marker. For example, in Park et al,8 the AUC for the TyG index in predicting diabetes incidence was 0.640 (95% CI 0.628, 0.652), but this does not tell us about the ability of the TyG index to capture a treatment effect. A high AUC for predicting primary outcome does not necessarily mean that the marker can replace the primary outcome in a future study and be used to make inference about the treatment effect on the primary outcome. In addition, the TyG index is a function of serum triglyceride, which is often considered to be a highly sensitive lifestyle measure that is influenced by factors such as alcohol intake and medications such as beta-blockers, corticosteroids, and estrogens.31 The potential influence of such factors, or especially any changes in these lifestyle factors, on the TyG index over time may hinder the utility of the TyG index as a surrogate marker and possibly lower the PTE of this marker.

The PTE is a popular measure for surrogacy evaluation, but is also frequently criticized.20 32 In the past, the tendency has been to estimate the PTE using a simple regression model framework, where a regression model is first fit with only the treatment indicator in the model and then a second model is fit adding in the surrogate marker.12 The PTE is estimated by looking at how much the regression coefficient for treatment changed when the surrogate marker was added to the model. The fundamental problem with this approach is that it is only valid if these two regression models hold. In practice, it is almost impossible to correctly specify models to describe clinical data. Furthermore, in the censored outcome setting where Cox proportional hazards models are used for the analysis, it is known that all relevant model assumptions cannot simultaneously hold in most cases.19 Using such a model-based approach can result in substantial bias in the estimate of the PTE.19 21 Non-parametric methods used in this analysis do not assume any model specification and thus do not have this disadvantage. However, non-parametric methods generally require a large sample size and are not feasible to use in small sample data sets. Future statistical work is needed to develop robust methods that can be used when the sample size is small.

The ability to use a surrogate marker to evaluate a treatment effect earlier or with less cost is the ultimate goal of surrogate evaluation. While we focus on the surrogate information at 1 year, our approach could similarly be used to examine the PTE of surrogate information at year 2 or 3 when t=4 years. In addition, there are parallel model-free methods to investigate whether this earlier information can achieve a similar power as the study based on diabetes incidence at year 4, which would then suggest a shorter clinical trial duration is possible. However, using an invalid surrogate marker to make a decision about a treatment effect can have dire consequences. One could conclude a treatment effect when there truly is no treatment effect on the primary outcome, or vice versa conclude there is no treatment effect when there truly is one. The most worrisome concern is the possibility of the so-called surrogate paradox, where the treatment has a positive treatment effect on the surrogate marker, but a negative treatment effect on the primary outcome.20 33 Strong statistical assumptions are needed to ensure that this situation does not occur. Evaluating a surrogate marker using data from a randomized trial, as we have done in this study, lessens (but does not eliminate) the number of assumptions that are needed; evaluating a surrogate marker using observational data is much more difficult as there is no longer balance between the treatment groups at baseline.

Although HbA1c is the only surrogate marker listed for type 2 diabetes in the FDA’s current table of surrogate markers/endpoints, our results found that both glucose and HOMA-IR explained more of the treatment effect on diabetes incidence than did HbA1c.4 Interestingly, for individuals at high risk for diabetes, treatments aimed at preventing or delaying a diabetes diagnosis often no longer even attempt to promise effectiveness with respect to actually preventing diabetes, but instead directly market their effectiveness on a surrogate itself, such as glucose or HbA1c. This marketing practice can be dangerous as it implies that individuals eligible to take the medication are likely making the causal association on their own, for example, “if I take this drug and I lower my glucose, then I will be less likely to be diagnosed with diabetes.”

An important area for future work is the examination of potential heterogeneity in the utility of a surrogate marker.34 That is, a biomarker may be a valid surrogate marker for certain subpopulations, but not others. Park et al8 conducted a population-based cohort study from the Korean Genome and Epidemiology and compared the predictive ability of the TyG index and HOMA-IR. They found that the TyG index was a better predictor of incident diabetes than HOMA-IR and hypothesize that this may be attributable to the high glycemic index characteristics of the Korean diet and the dual dimensions of the TyG index. Specifically, while HOMA-IR reflects hepatic insulin resistance, the TyG index is thought to reflect the insulin resistance of both the adipose and hepatic tissues, resulting in the TyG index being superior in predicting diabetes incidence, particularly in Koreans who prefer to consume high glycemic index foods. In addition, previous work has demonstrated lower pancreatic volume, higher pancreatic fat content, and lower HOMA-IR among healthy Koreans compared with a matched sample of healthy white individuals.35 It is unknown whether similar heterogeneity may exist with respect to surrogacy for these markers. For example, it is possible that the PTE for TyG index and/or HOMA-IR may be different for different subpopulations. Unfortunately, the racial/ethnic composition of the DPP study does not allow us to examine this, but future work is needed to explore surrogate heterogeneity, particularly in Asian populations.

For a surrogate marker to gain clinical acceptance as a valid surrogate marker for a particular outcome, multiple studies with different patient populations and using different types of analyses are often needed. Given the potential consequences of using an invalid surrogate, results from a single study are simply not enough. Unfortunately, while a plethora of clinical trials are constantly in progress, it is often difficult to gain access to data from these clinical trials. Thus, it is essential to emphasize the importance of data sharing and data access for the purpose of furthering surrogate marker research and evaluation.

Our study has some limitations. First, the DPP participant population is not necessarily generalizable to the general population, and thus our results are not generalizable, particularly to populations with different racial/ethnic compositions. This is particularly relevant in surrogate marker research because concerns about transportability of surrogate information from one study to a future study may impact a decision to use the surrogate marker in a future study to test for a treatment effect. In addition, none of the examined markers explained a sufficiently high proportion (eg, over 0.90) of the treatment effect and thus the results from this study do not necessarily support replacing diabetes incidence with change in any of these markers in a future study. Despite these limitations, this study is the first to compare these five surrogate markers using a robust model-free approach, and these results provide insight into the complex relationship between the change in these markers and diabetes incidence that may be useful for future studies in diabetes prevention.