Introduction

Non-pharmacological and pharmacological interventions are able to decrease the incidence of type 2 diabetes in high-risk individuals. The ultimate aim of these interventions is the prevention or the delay of the onset of diabetes-related macro- and microvascular complications that often lead to considerable morbidity and premature death, but a considerable number of individuals who could benefit from such interventions are not aware of their disease risk. Numerous prognostic models and scores for type 2 diabetes have been developed [13] based on known risk factors, including age, sex, obesity, metabolic and lifestyle factors, family history of diabetes or ethnic background. Given that the performance of these risk scores is often far from perfect, it is desirable to identify novel prognostic factors, such as biomarkers from ‘-omics’ technologies, with the aim of achieving better model accuracy.

The main purpose of this review is to appraise the potential of novel biomarkers to improve risk prediction for type 2 diabetes. To achieve this, we will proceed in four steps. First, we will critically discuss statistical methods to compare risk scores without and with biomarkers and to quantify any potential improvement. Second, we will very briefly summarise contemporary methodology to assess the performance of diabetes risk scores based on established risk factors. Third, we will provide an overview of novel biomarkers that have either been investigated in the context of risk prediction or will soon be available for such analyses. Fourth, we will suggest approaches to make more efficient use of biomarkers, and discuss limitations which we should be aware of in our search for the optimal risk model for incident type 2 diabetes.

Methodological issues involved in the assessment and comparison of the performance of risk models using different biomarkers

As reviewed recently [46], it is important to consider methodological issues in the development of risk prediction models. Before the incremental value of novel biomarkers for diabetes prediction can be evaluated, several specific issues related to model performance deserve attention and will be briefly summarised.

Discrimination measures

First, suitable measures of discrimination must be selected. How well a risk model identifies those who will develop a disease over the follow-up time in a cohort study is defined as discrimination. Three common measures of discrimination are explained in the text box [710]. The most popular measure of discrimination is the area under the receiver operating characteristic curve (AROC) or c statistic. The interpretation of AROCs might not be straightforward. For example, an AROC of 0.8 does not mean that 80% of persons who will develop diabetes are actually identified but, rather, that the likelihood is 80% that a randomly selected case (i.e. a person who will develop diabetes) will be assigned a higher estimated diabetes risk than a randomly selected non-case (i.e. a person who will remain diabetes-free). AROCs only use rank information and are quite insensitive to the addition of even strong risk predictors to an established model with a reasonable predictive ability [7, 8].

Measures of discrimination for prediction models

Measure

Explanation

Interpretation

Advantages and disadvantages

Area under the receiver operating curve (AROC, c statistic) [7, 8]

Among all the pairs in the cohort consisting of one participant with and one without incident diabetes, the AROC is the proportion of pairs for which the probability of getting the disease as estimated by the prediction model is larger for the case than for the non-case.

Or

Area under a plot of sensitivity (true-positive rate) vs 1 − specificity (false-positive rate).

An AROC of 0.5 means that the prediction model is no better than tossing a coin (in other words: there is one true positive case for each false positive case).

An AROC of 0.9 is excellent.

An AROC of 1 is the maximum.

+

AROCs are routinely provided by statistical packages for logistic regression models.

+

To determine whether increases in AROCs upon addition of new markers to a model are statistically significant, specific tests are available [9].

AROCs only make use of rank information.

They are quite insensitive to the addition of new markers to established models.

+/−

They cover the whole range of cut-off values.

Net reclassification improvement (NRI)

[10]

NRIs afford setting up categories of diabetes risks (i.e. 0–<5%, 5–<10%, 10–<20%, ≥20%). If a given model is compared with a new model that includes an additional new marker, NRI is calculated as follows:

+ probability of classification in a higher risk category for cases

− probability of classification in a lower risk category for cases

+ probability of classification in a lower risk category for non-cases

− probability of classification in a higher risk category for non-cases.

An NRI of 0 means no predictive improvement by the new marker.

A large NRI indicates that a high proportion of individuals moves into a more appropriate category of predicted risk.

+

NRIs may indicate changes in risk category when changes in AROC are minimal.

+

Tests for the null hypothesis, NRI = 0, are available.

NRIs only reflect changes in predictive ability. They cannot be used to characterise the predictive ability of a given model per se.

NRIs strongly depend on the number of risk categories and their cut-off points.

+

In the absence of established risk categories, category-free NRIs can be used for comparison purposes.

Integrated discrimination improvement (IDI) [10]

IDIs represent a continuous version of NRI.

An IDI of 0 means no predictive improvement.

+

As for NRI, IDI can reveal changes in disease risk when changes in AROC are minimal.

+

Fixing risk categories is not necessary.

Like NRIs, IDIs only reflect changes in predictive ability.

When modifying risk scores, it might be most important to improve the risk stratification of individuals thought to be at intermediate risk (i.e. those with a diabetes risk in the range of 5–15%). However, the AROC and the integrated discrimination improvement (IDI) are both continuous measures of discrimination that do not require fixed cut-off points and, thus, an increase in AROC or IDI does not necessarily indicate a better prognosis in persons at intermediate risk, because such an increase could also be due to more accurate prediction in persons with an apparently poor or good prognosis [11].

The net reclassification improvement (NRI) might be more sensitive than AROCs to the incremental predictive power of new markers [7]. Moreover, in calculating NRIs, the improvement of diabetes prediction can be considered separately for cases and non-cases at low, intermediate and high risk of diabetes. However, the NRI also has some caveats [10]. In particular, it is strongly dependent on the number of risk categories and on the cut-off points selected for risk stratification [12]. Thus, for calculating NRIs, Leening and Cook strongly recommend a priori risk classifications with a clinical meaning [13], but with respect to diabetes, there are no standard risk categories yet.

In view of the strengths and drawbacks of AROC, NRI and IDI, Pencina et al suggested using all three measures of discrimination to assess the incremental value of a new marker [14]. Pencina and co-workers proposed a category-free NRI that does not depend on the selection and the number of categories but is instead based on any change in the estimated risks [15]. In the absence of established risk categories for type 2 diabetes, category-free NRIs might be useful for comparison purposes.

Calibration

Besides discrimination, calibration is of particular importance in the application of prediction models. Calibration refers to how well the predicted probabilities agree with the observed diabetes risk. Calibration might be poor if the prevalence of diabetes in the dataset used to develop the score differs widely from the population in which it is applied. A common test of calibration is the Hosmer–Lemeshow test, which is based on a χ 2 statistic. To cope with poor calibration, several methods for updating models have been suggested [16]. Updating methods range from simple recalibration methods to more sophisticated revision methods. In its simplest form, recalibration means adjusting the intercept of the prediction model leaving the regression coefficients unchanged. A further recalibration method includes adjustment of the intercept and multiplication of all regression coefficients with the same factor (calibration slope) [16].

An example of a calibration assessment is provided in electronic supplementary material (ESM) Table 1. A non-invasive diabetes prediction model was applied to its developmental dataset from the Cooperative Health Research in the Region of Augsburg (KORA) S4/F4 cohort. For each participant, the probability of developing the disease was estimated according to the risk score and, based on these estimated probabilities, participants were ranked from the lowest to the highest estimated risk and grouped into ten groups of approximately equal size (deciles). Thus, for each decile the number of expected diabetes cases can be calculated (number of individuals in the decile multiplied by the mean estimated risk for that decile) and compared with the observed number of incident cases. If the actual prevalence of diabetes is considerably different from the estimated prevalence, the test would result in a low p value, indicating poor calibration. In the example, the estimated and the real number of cases are similar, which is reflected by a non-significant p value of 0.66.

Internal and external validation

Risk scores often show model overfit in the datasets used for model development. This is because regression coefficients are estimated with maximum likelihood methods, so that the prediction of the outcome in the original data is optimal. Thus, AROCs obtained from original data are often considerably larger than AROCs obtained from other independent data. Therefore, model overfit requires external validation of a prediction model before its widespread use.

External validation means applying the prediction model to a dataset with different individuals and re-assessing the measures of model performance. Internal validation (such as cross-validation or bootstrapping methods) is not a full equivalent to external validation, as internal validation still relies on the original data [5]. The use of very heterogeneous datasets for external validation is a widely neglected source of error [8]. AROCs are calculated as the proportion of pairs composed of one case and one non-case, where the estimated probability is larger for the case than for the non-case. As an example, in a dataset including large proportions of younger and older subjects, there are many pairs of one younger, healthy person who does not develop diabetes, and one older person who develops diabetes. Even poor prediction models assign larger diabetes probabilities to the older case than to the younger non-case, which leads to an increase in the AROC. This means that, for example, AROCs are larger when they are calculated for younger and middle-aged subjects than for middle-aged subjects alone. Examples of external validation of diabetes risk scores are given in Table 1 [1730]. Quite often, not all the risk factors included in the original score are available in the dataset used for external validation. Thus, the original prediction models sometimes undergo some transformation before external validation.

Table 1 Diabetes risk models with examples of external evaluation

External validation is a key component to assess the extent to which novel biomarkers can improve risk prediction. Genome-wide association studies often demonstrate a so-called ‘winner’s curse’, with more pronounced associations in discovery datasets than in replication datasets. These data from genomic studies clearly show the importance of external validation for all biomarkers before their inclusion into prediction models.

Prediction models with established, non-invasive and conventional clinical variables

Model accuracy is most commonly assessed by AROCs. In the examples given in Table 1 [1730], AROCs of 0.71 to 0.78 have been achieved with non-invasive models, while models including measures of glycaemia or routine metabolic laboratory analyses have achieved AROCs of up to 0.85. Fasting and postload glucose levels are by themselves strong predictors of diabetes. Thus, the extent to which glycaemic measures contribute to diabetes risk scores should be discussed briefly. Individuals with elevated HbA1c (6.0–6.4% [42–46 mmol/mol]), impaired fasting glucose ([IFG] fasting glucose 6.1–6.9 mmol/l) or impaired glucose tolerance ([IGT] 2 h OGTT glucose 7.8–11.1 mmol/l) have a strongly increased risk for type 2 diabetes compared with normoglycaemic people [31]. Persons with IFG and IGT have an even higher risk of diabetes than those who have only one of the two disorders [31]. As an example, in the KORA cohort of older participants, almost half of those with IFG and IGT combined developed type 2 diabetes over 7 years [32].

The main metabolic risk factors for isolated IFG and isolated IGT are different. The pathophysiology of isolated IFG seems to include reduced hepatic insulin sensitivity, beta cell dysfunction and low beta cell mass [33]. In contrast, isolated IGT is characterised by reduced peripheral insulin sensitivity but near normal hepatic insulin sensitivity and progressive loss of beta cell function. Individuals with combined IFG and IGT exhibit severe defects in both peripheral and hepatic insulin sensitivity, as well as loss of beta cell function [33].

Although a clearly increased diabetes risk can be observed for individuals with isolated IFG and those with isolated IGT, the categorisation of individuals as either ‘normal’ or ‘pre-diabetic’ (IFG, IGT) neglects the fact that a significant increase in diabetes risk also exists for increasing fasting glucose levels within the normal range [34]. Glycaemic measures (fasting and 2 h glucose, HbA1c) are strong diabetes risk predictors, but may be more useful without classification, e.g. as a continuous risk factor. This has been indicated in the German KORA and the Danish Inter99 studies [18, 35].

It is of clinical importance whether a single glycaemic measure performs as well as a simple clinical score. In the multiethnic Atherosclerosis Risk in Communities (ARIC) study the simple risk score including waist circumference, height, blood pressure, family history of diabetes, ethnicity and age performed similarly to fasting glucose alone (AROC 0.71 vs 0.74, p = 0.2) [22]. Figure 1 and ESM Table 2 show that the separate addition of fasting glucose, HbA1c or 2 h glucose to basic models with non-invasive variables leads to a strong increase in model accuracy [18, 24, 3638]. In several studies, HbA1c improved the predictive power to a similar extent to fasting glucose. In the KORA study, the strongest incremental value was seen on the addition of 2 h glucose [18]. In the Study of Health in Pomerania (SHIP) cohort, even random glucose improved the predictive ability of diabetes risk scores [36]. Thus, the predictive potential of glucose values can also be used in non-fasting participants.

Fig. 1
figure 1

Increase in the AROC achieved by adding glycaemic measures to a basic prediction model: KORA S4/F4 Study. Data are from the KORA S4/F4 Study (n = 881; age range 55–74 years; 7-year follow-up) [18] Please see ESM Table 2 for 95% CIs of AROCs. The basic model included age, sex, BMI, hypertension, parental diabetes and former or present smoking. Diabetes was ascertained by validated self-report or OGTT. *p < 0.05; **p < 0.01; ***p < 0.001 vs the basic model

Taken together, non-invasive risk factors including age, sex, BMI, waist circumference, family history, smoking or hypertension form the basis of all diabetes risk scores. Routine clinical biomarkers, such as glucose, HbA1c, lipids and uric acid, have the potential to improve the predictive ability of these basic risk factors, but AROCs rarely exceed 0.85. This argues in favour of a search for novel risk factors to further improve the accuracy of diabetes risk models.

Novel biomarkers from ‘-omics’ technologies as potential components of risk models

Despite moderate or even good model accuracy in some studies (Table 1, ESM Table 2), current prediction algorithms leave room for improvement and raise the question of whether novel biomarkers could be clinically useful, particularly if they could improve risk models that already contain measures of glycaemia. The range of molecules that could serve as potential biomarkers of diabetes risk includes genetic variants, RNA transcripts, peptides and proteins, lipids and small metabolites, cellular markers and metabolic waste products [39]. Owing to current advances in ‘-omics’ technologies, such as genomics, transcriptomics, proteomics and metabolomics, the number of candidate biomarkers keeps growing; however, only a small proportion of these has been investigated with reference to their potential to improve the prediction of type 2 diabetes.

Genetic variants

The heritability of glycaemic traits and type 2 diabetes is high [40], and the large genome-wide association studies published to date since the first in 2007, based on up to >105 study participants, has helped us to better understand the genetic architecture of this disease. Single nucleotide polymorphisms (SNPs) in more than 60 regions throughout the genome (so-called susceptibility loci containing multiple genes) were found to be associated with the risk of type 2 diabetes [39, 4144]. Most of these SNPs are common, with minor allele frequencies of 10–90%. Interestingly, loci associated with diabetes risk show only a partial overlap with loci that determine levels of fasting glucose, 2 h glucose and HbA1c. Thus, some loci influence both disease risk and glycaemic traits, whereas others seem to mainly regulate glucose levels within the physiological range without affecting the development of overt type 2 diabetes, and vice versa [45, 46].

Most susceptibility loci harbour genes that play a role in pancreatic development and in beta cell function in adults, whereas loci that could be linked to insulin resistance are less frequent [43, 4648]. Other loci are enriched in genes involved in cell cycle regulation, adipocytokine signalling, CREB binding protein (CREBBP)-related transcription and regulation of circadian rhythm [43, 44]. It can be expected that the aforementioned search for the genetic location of causal variants within these loci will lead to a list of novel pathophysiological mechanisms that may serve as therapeutic targets.

The currently known risk variants have rather modest effect sizes; the presence of each risk variant or allele is only associated with increases in diabetes risk of between 5% and 40% (ORs 1.05–1.4). Therefore, these loci do not explain more than 10–15% of the estimated genetic heritability of type 2 diabetes [44, 49]. This estimate is in line with the observation that known risk variants explain only a small fraction of family history-associated diabetes risk [50]. Combinations of up to 40 SNPs resulted in AROCs of 0.55–0.63, which is substantially lower than those achieved by age, sex and BMI alone. In some studies, the addition of genotype information to models based on established anthropometric and clinical risk factors led to statistically significant increases in AROCs, but these improvements were usually not larger than 0.03 [51, 52]. In line with the findings for AROCs, only a few studies reported improvements of NRI and/or IDI by including SNP data, but these improvements were always too low to be of clinical relevance [53, 54].

It should be noted that the effect of genetic markers on risk prediction may be more pronounced in younger individuals, in leaner persons and in studies with long follow-up periods [53, 54], but few studies on young populations, in which the assessment of future genetic risk may be most relevant, are currently available [55]. The initial age of individuals is closely related to the time horizon for any model to predict type 2 diabetes. Several prospective studies have applied genetic risk scores for follow-up times of approximately 10 years. This time period corresponds to that in tools such as the Framingham Risk Score, which estimates an individual’s 10-year risk for incident cardiovascular disease. It has been proposed that genetic risk scores might be more helpful in longer term prediction because, in contrast to variables used in clinical risk scores, genetic variants do not change over time [52, 56]. Eventually, the time horizon for risk models needs to correspond to the period before the onset of type 2 diabetes in which preventive efforts are most effective.

Another caveat is that most genome-wide association and prediction studies have been conducted in populations of European descent [44, 51, 52], and case–control and prospective genetic studies in African-American [57, 58] or Asian [5961] populations are still rare. It has been hypothesised that different risk alleles and allele frequencies in various ethnic groups could contribute to global differences in incidence rates of type 2 diabetes [62], but this needs to be corroborated in further studies.

Recent simulation studies indicate that an increase of common SNPs currently below the threshold of genome-wide significance in prediction models by hundreds or several thousand may be able to capture up to half of the risk of type 2 diabetes and thus most of the genetic component [43]. In addition to the investigation of common SNPs, ongoing projects using DNA sequencing are addressing the issue of ‘missing heritability’, leading to the identification of further risk variants, especially with lower risk allele frequencies. One recent study of the MTNR1B locus encoding melatonin receptor 1B indicated that this locus may not only contain common variants with low effect sizes (ORs <1.4), but may also contain rare variants with considerably stronger associations with the risk of type 2 diabetes (OR 5.7, 95% CI 2.2, 14.8) for rare loss-of-function variants of the receptor [63]. Sequencing of all genes in the genome (exome sequencing), as recently reported for a Danish case–control study [64], and whole-genome resequencing, as performed in the 1000 Genomes Project [65], will improve our understanding of the potential relevance of low-frequency (0.5–5%) and rare (<0.5%) variants in the development of type 2 diabetes [66]. It remains to be seen to what extent ongoing studies and analyses of other kinds of genetic variations such as copy number and structural variations will contribute to more precise risk assessment.

Finally, it should also be noted that the problem of ‘missing heritability’ does not only refer to the proportion of phenotypic variance that can be explained by known risk variants (the numerator, which will undoubtedly increase with further studies). ‘Missing heritability’ is also affected by the total phenotypic variance of type 2 diabetes caused by genetic variants, which represents the denominator in our formula for estimating the proportion of explained heritability. It is difficult to accurately assess total phenotypic variance because it may be inflated by ill-defined shared environmental factors in families, by gene–gene interactions and by epigenetic phenomena. Therefore, a more precise quantification of total heritability is required to better define the contribution that genetic data can make to models of risk prediction.

Transcriptomics and type 2 diabetes: RNA species

mRNAs and microRNAs (miRNAs) from various tissues have been investigated as biomarkers of type 2 diabetes, mainly in small and cross-sectional studies [67]. Consequently, it is not clear whether the analysis of the human transcriptome can improve the accuracy of current risk scores. In the context of risk assessment, blood samples appear to be the most suitable biomaterial for transcriptome analyses because they are routinely obtained clinically. Methods for the analysis of transcriptomics datasets in relation to phenotypes and disease risk are currently being developed [68].

MiRNAs have been linked to insulin resistance, reduced beta cell function and type 2 diabetes [69, 70]. In the Bruneck study (South Tyrol, Italy), five miRNAs extracted from plasma were found to be associated with incident type 2 diabetes, but their performance in combination with established risk scores was not reported [71].

Gene expression is regulated at several levels, including epigenetic changes of the genome such as DNA methylation and histone modification. Commercially available bead array-based platforms can analyse DNA methylation intensities at almost 500,000 sites throughout the whole genome. The first results from studies linking epigenetic changes to glycaemic traits and type 2 diabetes risk will be available over the coming years [72].

Peptides and proteins

The complexity of the human serum or plasma proteome consisting of approximately 106 different protein species means, on the one hand, that blood is a rich source of potential biomarkers of diabetes risk but, on the other, that the comprehensive quantification of even a substantial fraction of these peptides and proteins is extremely challenging from a technological perspective [73, 74].

A range of hypothesis-driven studies investigated the contribution of multiple protein biomarkers such as liver enzymes, lipoproteins, insulin or markers of subclinical inflammation, iron metabolism and endothelial dysfunction to established risk scores of type 2 diabetes. A substantial increment of c statistics is possible if these prediction models do not contain a measure of glycaemia [75]. However, protein-based biomarkers that not only lead to statistically significant, but also to clinically relevant improvements of model accuracy remain to be identified for models that already consider glucose or HbA1c [18, 35, 7680], as summarised in a recent review [39].

One hypothesis-free prospective study used linear matrix-assisted laser desorption/ionisation time-of-flight mass spectroscopy to characterise protein profiles in serum samples from 85 cases with incident type 2 diabetes and 195 normoglycaemic controls within the Whitehall II cohort. Six protein peaks were significantly associated with incident type 2 diabetes after adjustment for age, sex, obesity, lipids, C-reactive protein, fasting glucose and 2 h glucose, but no data on the potential improvement of prediction models by these proteins were provided [81]. However, this work can be seen as a proof-of-concept study suggesting that proteomic methods may be useful for the detection of blood proteins that play a role early in the development of type 2 diabetes.

Lipids and small metabolites

While triacylglycerols and cholesterol have been used in various risk scores resulting in only modest improvements of model accuracy [1], their subfractions and smaller lipids, as well as sugars, amino acids, organic acids, nucleotides and other small-molecule metabolites from serum or plasma samples, are less well investigated but have moved into the focus of metabolomics studies [82]. Cross-sectional approaches have identified ‘metabolic signatures’ associated with insulin resistance and type 2 diabetes [82, 83], and thus indicated their potential as prognostic biomarkers for type 2 diabetes risk.

Very recently, data on lipids and small metabolites have become available from prospective studies, and these are summarised in Table 2 [8491]. These studies showed that elevated levels of branched-chain and aromatic amino acids and lower levels of glycine are associated with incident type 2 diabetes or deteriorating glucose homeostasis [84, 8690]. In addition, various lipid species and lipid fractions [85, 8790], as well as other small metabolites [87, 89, 91], showed significant associations with the risk of type 2 diabetes or incident impaired glucose metabolism after adjustment for multiple confounders. Some of the aforementioned studies compared the accuracy of prediction models without and with metabolites (Table 2) and found fairly modest improvements in AROCs for models that included metabolomics in addition to established risk factors for type 2 diabetes [84, 88, 89, 91].

Table 2 Prospective metabolomics studies in the field of type 2 diabetes

Opportunities for and limitations to the use of biomarkers for the prediction of type 2 diabetes

Opportunities: repeated measurements of biomarkers

In the aforementioned studies, associations and risk score performances were mainly based on single biomarker measurements, which are all characterised by normal intraindividual variation over time (with the exception of genetic markers). Repeated measurements of biomarkers within days or weeks could be useful to improve measurement precision, but may be inconvenient for the patient.

There is growing evidence that biomarker trajectories preceding diabetes development for cases and non-cases diverge over time [9296]. Such trajectories require blood samples to be taken over a wider timeframe (several years or decades). These curves enable a better understanding of the pathophysiological processes of diabetes development and have been described for fasting and postload glucose, HbA1c, interleukin-1 receptor antagonist, adiponectin, alanine aminotransferase and triacylglycerols [9296]. Deeper insight into the development of type 2 diabetes can also be expected from the analysis of established metabolic risk factors such as BMI, waist circumference, other lipids or uric acid, for which multiple measurements in the same patients over time are usually available to the treating general practitioner. In a previous analysis from the Whitehall II study [94] a 0.5 mmol/l difference between fasting glucose 3 years before diabetes diagnosis and a 0.3 mmol/l steeper increase in fasting glucose in later diabetes cases were observed compared with non-cases (Fig. 2), suggesting that fasting glucose values measured 5–10 years apart could provide improved prediction of diabetes over a single glucose measurement.

Fig. 2
figure 2

Fasting glucose trajectories before diagnosis of diabetes or the end of follow-up in the Whitehall II study. The analysis is based on 505 incident diabetes cases (triangles) and 6033 individuals who remained diabetes-free (squares). Time 0 is diagnosis for incident diabetes cases or end of follow-up for non-diabetics. Graphs are based on multilevel longitudinal modelling. Modified from [94] with permission from Elsevier

While the use of repeated measurements for the prediction of diabetes seems to be a tempting approach, as repeated measurements of different diabetes risk factors are collected in general practice, only risk factors with highly different trajectories are expected to improve the predictive ability of a given risk score [97].

It may be argued that improved prediction based on multiple compared with single measurements of glucose is obvious, but the fact that current prediction scores do not make use of multiple measurements in clinical practice to improve individual risk assessment seems noteworthy. One important concern regarding repeated measurements of risk factors is that this approach might have negative effects on disease prevention as it may delay the initiation of preventive efforts. However, if single measurements are used for risk models where repeated measurements are not available, this would not delay any preventive or therapeutic interventions.

Limitations in disease prediction

One might ask to what extent AROCs can be improved by the addition of novel biomarkers. Perfect prediction of diabetes might not be possible for at least five reasons: First, the diagnosis of diabetes is not as clear as the diagnosis of other chronic diseases (e.g. cancer). As an example, diagnosing a person with diabetes when the 2 h glucose level is 201 mg/dl (11.17 mmol/l), but not when the level is 199 mg/dl (11.06 mmol/l) is, to some extent, dependent on chance and measurement imprecision. Second, the measurement imprecision also applies, to lesser or greater degrees, to all predictors used in risk scores. Third, risk scores cannot capture changes of lifestyle or medication following the assessment of individual risk. Fourth, incident cases of diabetes in the cohort study used to develop a prediction model might have been missed because they occurred after the end of the follow-up period, which contributes to measurement error in the outcome of type 2 diabetes. Fifth, many novel biomarkers described as independent risk factors for type 2 diabetes are correlated with traditional risk factors or other biomarkers [98]. They therefore only provide limited incremental information and do not contribute to better discrimination. A further limitation of risk models for type 2 diabetes is that they can predict the onset of the disease, but cannot predict the onset of micro- and macrovascular complications, the major determinants of quality of life, morbidity, mortality and diabetes-related costs. Recently, several different diabetes risk scores were applied to an external set of prospective data of older individuals, and the scores did not prove to be useful in the prediction of cardiovascular diseases [99].

Summary and outlook

A lot of work has been performed to assess the incremental value of novel markers, beyond established risk factors, for the prediction of diabetes. Nevertheless, several questions remain to be answered.

First, the addition of biomarkers to conventional diabetes risk scores has so far not or, at best, only slightly improved the predictive ability of the models. This raises the question, under which condition novel markers may have a larger incremental value. Often biomarkers are strongly correlated with conventional risk factors so that they do not provide additional predictive information [98, 100]. While in the near future many novel biomarkers are expected to be described as a result of technological progress, these will only improve diabetes prediction if they are at best weakly correlated with established risk factors. Moreover, it is conceivable that the slope of a biomarker trajectory (the change of the biomarker over time) captures incremental predictive information above the last measurement of the marker alone. However, the potential of trajectories has not yet been assessed for diabetes prediction.

Second, one might ask how good is good enough in diabetes prediction, and which criteria might be used to assess an individual’s diabetes risk with a sufficient level of precision. The question of sufficient precision can only be answered with regard to the purpose of the score. For a paper-and-pencil score used as the first step of a population-wide screening, sufficient level of precision could be lower than that for a score which is used to guide lifestyle recommendations and treatment for individuals in clinical practice. Furthermore, the ultimate performance measure of a novel marker will be the improvement in health outcomes through therapeutic changes and its cost-effectiveness [100]. However, critical risk values have not yet been defined for type 2 diabetes, and, thus, the question when risk models are good enough cannot be answered currently. As already stated by Hlatky et al [100], there is no single metric which assesses all the characteristics of a novel marker. For example, AROCs include only rank information and do not indicate how accurate predictions are. Therefore, other criteria like the IDI, a goodness-of-fit test, and positive and negative predictive values should be added.

Third, beyond optimising the predictive ability of diabetes risk scores, there is a wide range of issues which have not been considered in this review. From a public health perspective, it has to be asked whether diabetes risk scores are accepted by physicians, and which barriers might prevent physicians from using them; how scores are best implemented in clinical practice; to what extent intuitive risk assessments made by physicians are concordant with score-based assessments; and how good is the effectiveness and efficiency of diabetes prediction models. All these questions have hardly been addressed so far. Another issue to consider regarding non-economic costs relates to false positive test results (which could increase anxiety) and false negative risk estimates (which could lead to false reassurance). Finally, the successful implementation of any prognostic diabetes model will depend on a cost-effective intervention strategy for those persons for whom a high risk of developing type 2 diabetes is diagnosed. This list demonstrates that the assessment of the performance of novel biomarkers in risk models needs to be investigated in a substantially larger context than it is currently before recommendations for their widespread use can be given with certainty.