## Introduction

Clinical studies examining interventions to prevent or delay type 2 diabetes mellitus (diabetes) typically require long follow-up of participants in order to have sufficient power to detect an intervention or treatment effect. In such studies, the availability of a surrogate marker that can be used in place of the primary outcome to test for a treatment effect has the potential to decrease study time and costs, as well as patient burden.

The Accelerated Approval Program of the US Food and Drug Administration (FDA) allows for drugs to be approved based on demonstrated effectiveness on a surrogate marker.1 While this program allows for effective drugs to be made available to patients in need sooner, the requirements for what constitutes a surrogate marker are not clear. The FDA describes a surrogate marker as an intermediate endpoint that measures “a therapeutic effect that is considered reasonably likely to predict the clinical benefit of a drug, such as an effect on irreversible morbidity and mortality.”1 This definition and program more generally has been widely criticized for potentially allowing drugs to be approved and marketed without demonstrated effectiveness on the primary clinical outcome.2 A recent controversial example of this is the approval of aducanumab (Aduhelm) for Alzheimer’s disease, which showed effectiveness with respect to reducing amyloid plaques in the brain even though the theory linking amyloid clearance with slowing cognitive and functional decline is not completely clear.3

In the FDA’s current table of surrogate markers/endpoints that have been used as the basis for drug approval or licensure, the only surrogate marker listed for type 2 diabetes is serum hemoglobin A1c (HbA1c).4 However, in the clinical literature, it is well known that there exist other potential surrogate markers for diabetes, such as fasting plasma glucose.5 6 More recent surrogate markers include the triglyceride–glucose index (TyG) and the homeostatic model assessment of insulin resistance (HOMA-IR), an indirect marker of insulin resistance which has been widely used in practice, both of which have been shown to be predictive of diabetes incidence.7–9

The statistical validation of surrogate markers is complex and there is currently no agreement in the statistical literature on a single optimal way to validate a surrogate.10 The most commonly used measure of surrogate validity in practice is the proportion of the treatment effect on the primary outcome that is explained by the surrogate marker (proportion of treatment effect explained, PTE).11–13 While many other statistical measures have been proposed to evaluate surrogates, the single number summary of the PTE as well as its intuitive interpretation have led to the widespread use and acceptance of the PTE as a measure of surrogate strength.14–18 For example, since the PTE is a proportion, a value close to 1.0 reflects a surrogate that can explain almost all of the treatment effects on the primary outcome, while a value close to 0 reflects a surrogate that essentially cannot explain any of the treatment effect. Although there is no agreed-upon single threshold for what constitutes a “good” surrogate marker, previous work has proposed that PTE estimates above 0.90 and/or PTE estimates with a 95% CI whose lower bound is greater than 0.50 or 0.75 are indicative of a reasonably strong surrogate marker.19 A high PTE ensures that evaluating the treatment effect on the surrogate marker is likely to have a satisfactory statistical power and provide a good bound approximating the treatment effect on the primary outcome. A major criticism of the PTE is that its statistical inference relies on correct parametric model specification (eg, linear regressions) which must accurately capture the complex relationship between the treatment, the surrogate marker, and the primary outcome.19–21 Fortunately, recent work has developed a non-parametric model-free estimation approach to PTE that does not require any parametric specification and has been shown to perform well when the true relationship between these variables is complex.21–23

In this study, we construct a non-parametric estimate of the PTE using new statistical techniques without restrictive model assumptions. The model-free PTE estimate allows us to robustly measure and compare five surrogate markers (HbA1c, fasting glucose, 2-hour postchallenge glucose, TyG, and HOMA-IR) for diabetes diagnosis among patients at high risk for diabetes using secondary data from the Diabetes Prevention Program (DPP).