Discussion
The results demonstrate that, of the five examined surrogate markers, fasting glucose and HOMA-IR explained the largest proportions of the treatment effect when comparing lifestyle intervention with placebo. The TyG index and HbA1c were the weakest surrogate markers in terms of the proportion of the treatment effect that could be explained. The PTE for all markers decreased for later values of t, which is generally expected, as the treatment effect is generally more difficult to capture as the length of time between the surrogate marker measurement and the treatment effect measurement increases.
While we focus on the surrogate information at 1 year, our approach could similarly be used to examine the PTE of surrogate information at 2 years, when t=3 years or t=4 years. We conducted such an analysis as a sensitivity analysis (results not shown) examining the average changes in the surrogate markers during the first 2 years, investigating the PTE and IV at time points after 2 years only (3 years and 4 years). The general patterns were similar to the main results, where fasting glucose, HOMA-IR, and 2-hour glucose appear to have higher surrogacy than HbA1c and the TyG index.
Previous work examining TyG and HOMA-IR has demonstrated that these markers are independently predictive of diabetes incidence. Park et al8 showed that the TyG index was significantly better than HOMA-IR in terms of the area under the time-dependent receiver operating characteristic curve (AUC). Importantly, while prediction accuracy information is useful, estimates reflecting good prediction accuracy do not necessarily translate to a measurement being a good surrogate marker. For example, in Park et al,8 the AUC for the TyG index in predicting diabetes incidence was 0.640 (95% CI 0.628, 0.652), but this does not tell us about the ability of the TyG index to capture a treatment effect. A high AUC for predicting primary outcome does not necessarily mean that the marker can replace the primary outcome in a future study and be used to make inference about the treatment effect on the primary outcome. In addition, the TyG index is a function of serum triglyceride, which is often considered to be a highly sensitive lifestyle measure that is influenced by factors such as alcohol intake and medications such as beta-blockers, corticosteroids, and estrogens.31 The potential influence of such factors, or especially any changes in these lifestyle factors, on the TyG index over time may hinder the utility of the TyG index as a surrogate marker and possibly lower the PTE of this marker.
The PTE is a popular measure for surrogacy evaluation, but is also frequently criticized.20 32 In the past, the tendency has been to estimate the PTE using a simple regression model framework, where a regression model is first fit with only the treatment indicator in the model and then a second model is fit adding in the surrogate marker.12 The PTE is estimated by looking at how much the regression coefficient for treatment changed when the surrogate marker was added to the model. The fundamental problem with this approach is that it is only valid if these two regression models hold. In practice, it is almost impossible to correctly specify models to describe clinical data. Furthermore, in the censored outcome setting where Cox proportional hazards models are used for the analysis, it is known that all relevant model assumptions cannot simultaneously hold in most cases.19 Using such a model-based approach can result in substantial bias in the estimate of the PTE.19 21 Non-parametric methods used in this analysis do not assume any model specification and thus do not have this disadvantage. However, non-parametric methods generally require a large sample size and are not feasible to use in small sample data sets. Future statistical work is needed to develop robust methods that can be used when the sample size is small.
The ability to use a surrogate marker to evaluate a treatment effect earlier or with less cost is the ultimate goal of surrogate evaluation. While we focus on the surrogate information at 1 year, our approach could similarly be used to examine the PTE of surrogate information at year 2 or 3 when t=4 years. In addition, there are parallel model-free methods to investigate whether this earlier information can achieve a similar power as the study based on diabetes incidence at year 4, which would then suggest a shorter clinical trial duration is possible. However, using an invalid surrogate marker to make a decision about a treatment effect can have dire consequences. One could conclude a treatment effect when there truly is no treatment effect on the primary outcome, or vice versa conclude there is no treatment effect when there truly is one. The most worrisome concern is the possibility of the so-called surrogate paradox, where the treatment has a positive treatment effect on the surrogate marker, but a negative treatment effect on the primary outcome.20 33 Strong statistical assumptions are needed to ensure that this situation does not occur. Evaluating a surrogate marker using data from a randomized trial, as we have done in this study, lessens (but does not eliminate) the number of assumptions that are needed; evaluating a surrogate marker using observational data is much more difficult as there is no longer balance between the treatment groups at baseline.
Although HbA1c is the only surrogate marker listed for type 2 diabetes in the FDA’s current table of surrogate markers/endpoints, our results found that both glucose and HOMA-IR explained more of the treatment effect on diabetes incidence than did HbA1c.4 Interestingly, for individuals at high risk for diabetes, treatments aimed at preventing or delaying a diabetes diagnosis often no longer even attempt to promise effectiveness with respect to actually preventing diabetes, but instead directly market their effectiveness on a surrogate itself, such as glucose or HbA1c. This marketing practice can be dangerous as it implies that individuals eligible to take the medication are likely making the causal association on their own, for example, “if I take this drug and I lower my glucose, then I will be less likely to be diagnosed with diabetes.”
An important area for future work is the examination of potential heterogeneity in the utility of a surrogate marker.34 That is, a biomarker may be a valid surrogate marker for certain subpopulations, but not others. Park et al8 conducted a population-based cohort study from the Korean Genome and Epidemiology and compared the predictive ability of the TyG index and HOMA-IR. They found that the TyG index was a better predictor of incident diabetes than HOMA-IR and hypothesize that this may be attributable to the high glycemic index characteristics of the Korean diet and the dual dimensions of the TyG index. Specifically, while HOMA-IR reflects hepatic insulin resistance, the TyG index is thought to reflect the insulin resistance of both the adipose and hepatic tissues, resulting in the TyG index being superior in predicting diabetes incidence, particularly in Koreans who prefer to consume high glycemic index foods. In addition, previous work has demonstrated lower pancreatic volume, higher pancreatic fat content, and lower HOMA-IR among healthy Koreans compared with a matched sample of healthy white individuals.35 It is unknown whether similar heterogeneity may exist with respect to surrogacy for these markers. For example, it is possible that the PTE for TyG index and/or HOMA-IR may be different for different subpopulations. Unfortunately, the racial/ethnic composition of the DPP study does not allow us to examine this, but future work is needed to explore surrogate heterogeneity, particularly in Asian populations.
For a surrogate marker to gain clinical acceptance as a valid surrogate marker for a particular outcome, multiple studies with different patient populations and using different types of analyses are often needed. Given the potential consequences of using an invalid surrogate, results from a single study are simply not enough. Unfortunately, while a plethora of clinical trials are constantly in progress, it is often difficult to gain access to data from these clinical trials. Thus, it is essential to emphasize the importance of data sharing and data access for the purpose of furthering surrogate marker research and evaluation.
Our study has some limitations. First, the DPP participant population is not necessarily generalizable to the general population, and thus our results are not generalizable, particularly to populations with different racial/ethnic compositions. This is particularly relevant in surrogate marker research because concerns about transportability of surrogate information from one study to a future study may impact a decision to use the surrogate marker in a future study to test for a treatment effect. In addition, none of the examined markers explained a sufficiently high proportion (eg, over 0.90) of the treatment effect and thus the results from this study do not necessarily support replacing diabetes incidence with change in any of these markers in a future study. Despite these limitations, this study is the first to compare these five surrogate markers using a robust model-free approach, and these results provide insight into the complex relationship between the change in these markers and diabetes incidence that may be useful for future studies in diabetes prevention.