Introduction

On a global scale, the prevalence of obesity and type 2 diabetes is increasing dramatically, and WHO reports that more than 500 million people are obese [1] and 346 million have diabetes, 90% of whom have been diagnosed with type 2 diabetes [2]. Lifestyle and environmental factors are crucially important in the development of obesity and type 2 diabetes. Important risk factors for obesity are physical inactivity, excessive energy intake, depression, sleep disorders and low socioeconomic status, while major risk factors for type 2 diabetes include obesity, especially visceral fat deposition, physical inactivity, smoking, male sex, high age, sleep deprivation, urbanisation, low-socioeconomic status and ethnicity [36]. In addition, 40–70% of BMI variation is explained by genetic factors [7, 8] and, similarly, the increase in type 2 diabetes risk associated with having a sibling with type 2 diabetes is two- to threefold [9]. Family studies have shown similar heritability estimates of 50–60% for BMI and type 2 diabetes [10]. Since 2007, an explosion in our knowledge of specific genetic risk factors for obesity, type 2 diabetes and related phenotypes has taken place, mainly brought about by genome-wide association studies (GWASs). Here we review recent progress in concepts, methodologies and derived outcomes of studies of the genetics of type 2 diabetes and obesity, and we predict some of the directions this research field could take in the near future.

GWASs to discover the genetic basis of type 2 diabetes and obesity

Until 2007, genetic mapping of complex diseases such as type 2 diabetes and obesity was primarily achieved by genetic linkage analyses or candidate gene association studies, both of which have implicit shortcomings related to their design, which limit their application. At the same time, study sample sizes were generally too low to reach sufficient statistical power. However, progress in identifying common variants associated with type 2 diabetes and obesity has since been rapid, primarily as a consequence of technological advances in array-based genotyping, which paved the way for GWASs, together with increased sample sizes from international collaborations. These advances have led to the discovery of a wealth of genomic loci convincingly associated with complex metabolic traits [57].

Genetics of type 2 diabetes and glucose homeostasis: what is known?

As of the beginning of 2014, 90 genetic loci have been firmly established as type 2 diabetes risk loci (Fig. 1) [1125]. The risk variant in the TCF7L2 locus, which was discovered in 2006 by a positional linkage strategy in the Icelandic population [26], remains the most influential common type 2 diabetes variant (allelic OR ~1.46) [27]. While GWASs of type 2 diabetes have been highly successful, other type 2 diabetes-associated loci have been identified through studies of quantitative diabetes-related traits. These efforts have discovered 72 loci associated with quantitative traits reflecting glucose homeostasis, i.e. fasting glucose, fasting insulin, 2 h glucose during an OGTT and HbA1c [16, 2831]. Many of these loci are also associated with type 2 diabetes, yet the overlap between loci for these traits is not extensive (Fig. 1). In recently published reports, a mere 13 of 37 variants associated with fasting glucose were also associated with type 2 diabetes at what are considered statistically significant levels for GWASs [13, 29]. These findings indicate that some genetic variants may exert general modifying effects on fasting glucose levels in the population, while others have specific thresholds at which the genetic effect sets in, thereby conferring risk of type 2 diabetes without modifying levels of fasting glucose at the population level.

Fig. 1
figure 1

Venn diagram of intersection between loci associated at genome-wide significance with type 2 diabetes, measures of adiposity and glucose homeostasis. Genome-wide significant associations for six metabolic traits are shown. Gene symbols shown in the plot are by convention the closest gene and not necessarily the functional gene

Genetic variants associated with type 2 diabetes and glucose homeostasis can shed light on the relationship between genetically induced defects in insulin secretion and insulin action in the pathogenesis of type 2 diabetes. Most of the genetic variants found in the first GWASs were demonstrated to primarily cause a decrease in glucose-stimulated insulin response [32, 33]. A more detailed picture of beta cell pathogenesis is now emerging showing the specific impact of individual risk variants through studies of more exact physiological phenotypes and functional molecular genetic studies. Specific defects in glucagon-like peptide 1-stimulated insulin secretion, glucose-stimulated insulin secretion, insulin exocytosis, insulin granule docking or post-transcriptional processing of insulin have been demonstrated to be associated with different variants, supporting the notion that a range of biological processes are involved in the pathogenesis of type 2 diabetes [3338]. As for other epidemiological studies of insulin secretion, these efforts are impeded by the difficulty of accurately quantifying insulin secretion in an epidemiological setting. The insulin response is generally assessed as the secretion of insulin in response to a number of different secretagogues, all of which provide different physiological information [39]. Therefore, studies of this trait tend to be small and statistically underpowered to detect the modest effect of single SNPs.

In the first GWASs of type 2 diabetes and quantitative glucose homeostasis traits, few variants were shown to have an effect on insulin sensitivity [16, 40]. Of interest, the number of SNPs associated with insulin sensitivity has recently increased as a result of larger samples sizes, by the inclusion of concomitant levels of obesity as a covariate in regression models and by implementation of a joint test investigating the main effect while allowing for an interaction effect [29, 30]. Thus, many primary genetic defects in insulin sensitivity may not be mediated by levels of obesity.

Genetics of obesity and measures of body fat distribution

The discovery of variants associated with measures of adiposity through GWAS follows much the same story as identification of the type 2 diabetes risk variants. The discoveries have predominately involved individual studies and meta-analyses using BMI as a quantitative measure of adiposity. The first studies, which included relatively few individuals, identified two loci, FTO and MC4R [41, 42]. The necessity to increase statistical power led to larger studies that included ~32,000 individuals, and as a result an increased number of loci were identified [43, 44]. The largest meta-analysis performed using BMI as a measure of obesity included ~250,000 individuals and increased the number of identified BMI loci to 32 [45]. FTO, the first GWAS-identified obesity locus, remains the one with the largest effect, imposing an allelic 0.39 kg/m2 increase in BMI [41].

In parallel with the studies of BMI, a crude measure of overall adiposity, GWASs of quantitative measures attempting to capture abdominal obesity and specific elements of fat distribution, such as waist circumference and WHR, have also been performed. Studies including up to ~77,000 individuals have identified 19 loci associating with measures of body composition [4648], the vast majority associating with BMI-adjusted WHR [48]. Since most of the body composition loci have been identified by analysing WHR adjusted for BMI, no overlap exists between these and the 39 BMI loci (Fig. 1).

Within the obesity-GWAS framework, case–control approaches in children, adolescents and adults have also been widely used [4953]. The earliest studies were relatively small and substantial overlap with both BMI and body composition loci were detected. Two recent studies have increased the number of loci to 19, identifying 15 non-overlapping loci associated with clinical obesity among children and adults [54, 55]. There are several possible explanations for the partial genetic overlap of BMI and clinical definitions of obesity. While GWASs of BMI have focused on SNPs associated with mean BMI in populations, studies indicate that effects for many loci are not uniform across the BMI distribution. In studies of childhood BMI, the effect of several loci, including FTO, was stronger in the upper tail of the BMI distribution [56]. Along the same lines, a recent study showed that FTO genotype, which has been shown to be convincingly associated with mean BMI, is associated with variance in BMI [57], and since variance in BMI increases with BMI, this observation may explain the association of FTO with both mean BMI and severe clinical obesity. In addition, these studies point to interaction between genotype and measured or unmeasured environmental factors. Overall, GWASs have to date successfully identified more than 80 different loci associated with adiposity phenotypes. These results point to the existence of aetiologically distinct subsets of extreme phenotypes.

Transferability of genetic loci across ancestry groups

Although thus far the majority of GWASs have been performed on European individuals, a number of important studies of other ethnicities are emerging. These studies have reported novel loci such as KCNQ1 and C2CD4A associated with type 2 diabetes in Japanese individuals [17, 58, 59] and a number of loci for type 2 diabetes in East Asians [20, 22]. For obesity, studies have identified risk variants in PCSK1, GP2 and GALNT10 loci in Asian or African populations [60, 61]. Of interest, studies comparing associations in individuals of different ethnicities can shed light on the shared genetic vulnerability across ethnic groups and possibly add to fine-mapping efforts in associated loci. For type 2 diabetes, studies have found directionally consistent effects for known loci across ancestry groups [62, 63]. Comparative studies across ancestries in a recent GWAS of four ancestry groups showed that the effects of the many common variants not reaching statistical significance at a genome-wide level are homogenous across ancestry groups, and a trans-ancestry meta-analysis revealed seven novel genome-wide significant loci [25]. In addition, the study showed that fine-mapping associated loci can be improved by taking advantage of ancestry differences in linkage disequilibrium. Similarly, studies of obesity have shown highly comparable effects of common variants across major ancestry groups, strongly supporting shared common BMI and obesity loci across populations [60, 61, 64], although ancestry-specific loci have also been shown, such as KLHL32 in Africans and KLF9 in Asians [61, 64].

Glossary

1000 Genomes Project The 1000 Genomes Project, launched in January 2008, is an international research effort to establish a detailed catalogue of human genetic variation. Scientists planned to sequence the genomes of 2,500 participants from a number of different ethnic groups

Allele One of a number of alternative forms of the same gene or same genetic locus

De novo mutation An alteration in a gene that is present for the first time in one family member as a result of a mutation in a germ cell (egg or sperm) of one of the parents or in the fertilised egg itself

Epistasis When the effect of one gene depends on the presence of one or more ‘modifier genes’ (genetic background). Also referred to as gene–gene interaction

Exome The protein coding part of the human genome. The exome of the human genome consists of roughly 180,000 exons, constituting about 1% of the total genome, or about 30 megabases of DNA

Heritability The proportion of phenotypic variation of a trait that is due to underlying genetic variation

Imputation In genetics, imputation refers to the statistical inference of unobserved genotypes. It is achieved by using known haplotypes in a reference population, such as the 1000 Genomes Project, thereby allowing non-genotyped genetic variants to be tested for association with a trait of interest

Linkage disequilibrium A non-random association between alleles at different loci

Minor allele frequency Ranging from 0% to 50%, this is the proportion of alleles at a locus that contain the less frequent allele.

Private variants Variants restricted to probands and immediate relatives

Sequencing depth In DNA sequencing, depth refers to the number of times a nucleotide is read during the sequencing process. Deep sequencing indicates that the depth of the process is many times larger than the length of the sequence under study

Common features of the genetic associations with type 2 diabetes, glycaemia and obesity

Implicit in the initial design, GWAS-identified variants in type 2 diabetes, glycaemia and obesity are common (minor allele frequency [MAF] >5% in the population). Risk variants exert modest effect sizes on disease risk and variation in phenotype, and for the majority of loci, the causative variant and gene is unknown. For the majority of loci, the most strongly associated variant is not a coding variant but instead resides in an intron or in a non-coding sequence between genes. In addition, high correlation (i.e. linkage disequilibrium) between physically closely located markers makes it difficult to prove causality for associated variants. However, the causative variant and the molecular mechanism of action have been identified for some loci. In the GCKR locus an intronic variant was shown to associate with type 2 diabetes and fasting glucose [65], while subsequent studies detected an amino acid-changing variant (GCKR p.P446L) that was demonstrated to be the causative variant influencing hepatic glucose uptake [66, 67]. Similarly, follow-up studies of TCF7L2 have shown that the originally identified intronic rs7903146 variant is probably the causative SNP, which presumably regulates expression of alternative TCF7L2 isoforms in several target tissues [6871]. Many of the loci do not contain genes with known biological relevance to obesity or type 2 diabetes, providing an opportunity for novel biological investigations. Finally, common risk variants have been found in a number of genes known to be mutated in monogenic subsets of non-autoimmune diabetes (GCK, HNF1A, HNF1B, HNF4A, PPARG, KCNJ11, GLIS3 and WFS1) [11, 12, 16, 7280] or obesity (MC4R, POMC, LEPR, BDNF, SH2B1, PCSK1 and NTRK2) [42, 45, 8187].

Genetic overlap of obesity and type 2 diabetes: epidemiological vs genetic correlation

Although type 2 diabetes and obesity are highly interrelated from both epidemiological and pathophysiological viewpoints, the shared genetic aetiology imposed by hitherto identified common variants is limited (Fig. 1). Of 90 loci associated with type 2 diabetes and 56 loci associated with standard measures of adiposity, merely five loci are shared (FTO, MC4R, ADAMTS9, GRB14/COBLL1 and QPCTL/GIPR). Furthermore, at two of these loci (ADAMTS9 and GRB14/COBLL1), different and only partially correlated genetic variants are responsible for the associations, which brings into question whether they share functional disease mechanisms [11, 13, 14, 48]. There are many possible reasons for this apparent lack of genetic overlap. For example, the associations between genetic loci and traits shown in Fig. 1 are for associations at genome-wide statistical significance and therefore do not include shared associations below the level of significance, which may still be genuine.

Some light can be shed on the relationship between SNPs associated with type 2 diabetes or glycaemic traits and associations with measures of obesity using online large-scale databases of GWAS results. Figures 2 and 3 illustrate the correlated effects of SNPs associated with type 2 diabetes or BMI for a specific set of GWAS SNPs for a particular trait, comparing the effect on the primary trait with other correlated metabolic traits. For SNPs primarily associated with BMI, there seems to be a positive correlation between the effect size on BMI and the effect of the same SNP on type 2 diabetes (Fig. 2a). This finding indicates that the major reason why so few BMI-associated SNPs have been shown to associate with type 2 diabetes at genome-wide significance is a lack of statistical power to detect the minute derived type 2 diabetes risk increments imposed by BMI-associated variants. Similarly, there is a positive correlation between the effect of BMI SNPs on BMI and on the diabetes-related quantitative traits, fasting glucose and fasting insulin (Fig. 2b, c). In contrast, when looking at the effect sizes on BMI of SNPs associated with type 2 diabetes, there is no obvious correlation between effects; rather, it seems that most of the type 2 diabetes-associated variants have no impact on BMI per se (Fig. 3a). Similar observations are evident when comparing the effects of type 2 diabetes-associated SNPs on type 2 diabetes and WHR (Fig. 3b). A general effect on fasting glucose is seen for type 2 diabetes variants, yet no correlation with the effect on fasting insulin is evident (Fig. 3c, d). These observations are in line with the finding that most common type 2 diabetes risk variants have an intermediate impact on the ability to secrete appropriate amounts of insulin [33].

Fig. 2
figure 2

Correlation of effects of BMI-associated loci on BMI in relation to effects on type 2 diabetes and metabolic traits. Each dot shows the effect of a BMI-associated variant on BMI in relation to the effect on (a) type 2 diabetes, (b) fasting glucose, (c) fasting insulin and (d) WHR adjusted for BMI. Dots coloured red indicate an association (p < 0.005) with the trait on the y-axis. Effect sizes and p values were obtained from the largest available GWAS for the trait: type 2 diabetes (n cases = 12,171 and n controls = 56,862) from the DIAGRAM Consortium (http://diagram-consortium.org/) [13]; fasting glucose (n = 58,074) and fasting insulin (n = 51,750) from MAGIC (http://www.magicinvestigators.org/) [30]; and BMI (n = 123,912) and WHR (n = 77,149) from the GIANT consortium [45, 48]. Not all variants found in GWASs are included in these graphs

Fig. 3
figure 3

Correlation of effects of type 2 diabetes-associated loci on type 2 diabetes in relation to effects on BMI and metabolic traits. Each dot shows the effect of a type 2 diabetes-associated variant on type 2 diabetes in relation to the effect on (a) BMI, (b) WHR adjusted for BMI, (c) fasting glucose and (d) fasting insulin. Dots coloured red indicate an association (p < 0.005) with the trait on the y-axis. Effect sizes and p values were obtained from the largest available genome-wide association study for the trait: type 2 diabetes (n cases = 12,171 and n controls = 56,862) from the DIAGRAM consortium (http://diagram-consortium.org/) [13]; fasting glucose (n = 58,074) and fasting insulin (n = 51,750) from MAGIC (http://www.magicinvestigators.org/) [30]; and BMI (n = 123,912) and WHR (n = 77,149) from the GIANT consortium [45, 48]. Not all variants found in GWASs are included in these graphs

The findings described above are compatible with correlations between type 2 and obesity in epidemiological studies and the current view that obesity is one of the causes of type 2 diabetes. These relationships are illustrated by the effect of variation in FTO on type 2 diabetes. While initial findings showed an association with BMI and with type 2 diabetes mediated by obesity [41, 88, 89], other studies have pointed to an effect on type 2 diabetes that is independent of adiposity [90, 91]. While this may be a true adiposity-independent effect, it may also be caused by an inability to properly correct for adiposity by the use of BMI as covariate or by multiple functional mechanisms for variation in the FTO locus [92]. Nevertheless, correlation between two heritable traits in epidemiological studies does not necessarily lead to the conclusion that underlying genetic determinants between the two traits are similar. A critical feature is the amount of additive genetic variation shared between two traits, which can be estimated from family or twin studies and is expressed by a genetic correlation coefficient. If two epidemiologically correlated traits are partly heritable, the genetic correlation may still be modest or absent, indicating that different specific genetic factors are contributing to variation in the traits. Even genetically correlated traits may only share a minor fraction of quantitative trait loci [93]. Two large studies have tried to model the genetic correlation between type 2 diabetes and BMI by studying twin populations of more than 20,000 twins of Finnish and Swedish origin with long-term follow-up. Both studies found a high heritability of BMI and type 2 diabetes, yet the genetic correlation between BMI and type 2 diabetes was estimated to be ~40–45%, indicating that around one-fifth of the covariance of BMI and type 2 diabetes is due to shared genetic influences [94, 95]. Interestingly, these findings are consistent with discoveries from GWASs (Fig. 1).

GRB14/COBLL1: a genetic locus with a pleiotropic effect on the metabolic syndrome

Comparisons of effect sizes for loci associated with adiposity measures and type 2 diabetes indicate that not all of the more than 175 loci associated with common metabolic phenotypes display pleiotropic metabolic effects. GRB14/COBLL1 is a locus that has been shown to associate with a range of traits. Initially the major allele of rs10195252 near GRB14 was shown to associate with increased WHR [48]. Other studies have shown associations of the WHR-increasing allele or moderately correlated alleles with increased risk of type 2 diabetes [13, 14, 20], increased fasting insulin [29, 30], decreased HDL-cholesterol and increased triacylglycerol concentration [96], even after adjusting for obesity measures [97]. These associations are in concordance with a general pleiotropic metabolic risk profile affecting many of components of the metabolic syndrome in a clinically unfavourable direction. The nearest genes to the associated SNPs are GRB14 and COBLL1. Initial reports showed that the risk variant located upstream of GRB14 was associated with mRNA expression of GRB14 in both subcutaneous and omental fat tissues [48]. These data are supported by evidence from studies of Grb14-deficient mice showing improved glucose homeostasis despite lower circulating insulin levels and enhanced insulin signalling in liver and skeletal muscle [98]. This locus therefore seems to be an example of genetic risk variants showing metabolic pleiotropic effects.

Metabolically healthy vs metabolically unhealthy associations for obesity-associated variants

In general, alleles that associate with increasing BMI or WHR would also be expected to increase other metabolic risk variables such as cholesterol, triacylglycerol, glucose and insulin levels and risk of type 2 diabetes, thereby showing a general metabolic adverse profile. These expectations have been fulfilled for a genetic risk score constructed from 24 obesity-associated variants, which was strongly associated with insulin resistance and with risk of type 2 diabetes; yet, both associations were abolished after adjustment for BMI [99]. While some loci display such profiles, others have either limited pleiotropic effects or even show paradoxical associations. As described above, GRB14/COBLL1 is an example of an obesity-associated locus with pleiotropic effects on a range of phenotypes related to type 2 diabetes where all associations follow the expected metabolically unhealthy profile [13, 14, 29, 30, 96, 97]. Similarly, in the FTO, MC4R and GIPR loci, the alleles shown to increase obesity susceptibility are simultaneously associated with increased risk of type 2 diabetes [13, 41, 42, 45] (Fig. 1). Other obesity-associated variants exert distinct effects on other metabolic traits.

In the IRS1 locus, a variant located upstream of IRS1 was reported to be associated with increased risk of type 2 diabetes and decreased insulin sensitivity [40]. However, in a GWAS of body fat percentage the same locus emerged showing decreased body fat percentage for a perfectly correlated variant [100]. Interestingly, the allele associated with decreasing body fat percentage was also associated with decreased IRS1 expression and with an impaired metabolic profile, including an increased visceral fat:subcutaneous fat ratio, insulin resistance, dyslipidaemia, risk of type 2 diabetes and coronary artery disease and decreased adiponectin levels [100]. These findings establish IRS1 as an obesity-associated locus concurrently displaying a metabolically healthy profile, underlining the difficulties involved in accurately assessing specific components of obesity in an epidemiological setting and the subtle differences in the function and regulation of distinct adipose compartments.

A recent meta-analysis of data on up to ~37,000 individuals systematically evaluated metabolic pleiotropic associations for BMI- and WHR-associated variants [97]. Analysis of individual variants revealed that some were associated with a metabolically unhealthy profile whereas others displayed more complex associations [97]. Genetic risk scores, generated by adding the number of risk alleles for each individual, indicated that a high score based on WHR-associated variants had an adverse effect on serum lipid levels even after adjusting for adiposity. Interestingly, analyses of the genetic risk score based on BMI-associated alleles showed a generally metabolically unhealthy profile, which was abolished after adjustment for BMI, yet analyses adjusted for BMI revealed paradoxical associations with decreased plasma glucose at 2 h during an OGTT and with decreased systolic and diastolic BP for increasing number of BMI-increasing alleles [97]. Up-coming studies with even larger sample sizes will explore relationships between associations of specific loci and a range of metabolic traits and will probably elucidate further distinct mechanisms in adipose tissue function and physiological relationships between metabolic variables. Thus, knowledge of specific genetic variation associated with metabolic complications in obesity may in theory divide obese people into metabolically healthy and unhealthy subgroups. Yet, the evidence for the existence of so-called healthy obese individuals is conflicting, and recent studies question the importance of such groupings [101, 102].

Genetic architecture of type 2 diabetes: rare variants in a common disease?

The past 7 years of genetic discoveries brought about by the GWAS approach have meant a giant leap for genetic research of complex traits, with more than 175 genetic loci shown to associate with metabolic traits. Yet, the major part of the genetic predisposition to these phenotypes remains unaccounted for since the proportion of variance explained by genetic risk variants discovered to date is <10% for type 2 diabetes [13] and <2% for BMI [45]. There has been much focus on this missing heritability [103], which has revived the discussion of the overall composition of the genetic susceptibility, the genetic architecture, of type 2 diabetes, obesity and similar complex diseases. Hence, the ‘common variant–common disease hypothesis’ [104, 105], stating that the predisposition to common diseases stems from a moderate number of common variants, has largely been refuted. On the other hand, the ‘rare variant hypothesis’ [106, 107], which suggests that rare alleles with large effects are the primary drivers of common disease, has received renewed attention. Rare variants are common in the sense that they severely outnumber common variants in the human genome [108, 109]. Furthermore, evolutionary theory predicts that disease alleles should be rare, since even a minute fitness reduction will keep allele frequencies low as a result of negative selection [110]. In contrast, the ‘thrifty genotype hypothesis’ states that genetic variation advantageous during human evolution might now confer risk of disease owing to changes in living conditions and environmental exposures. Positive selection of genotypes increasing energy storage would drive such variations to high frequency [111]. However, a recent study of 65 common type 2 diabetes risk variants did not support this hypothesis [112]. Several examples of the association of rare variants with common disease have been demonstrated. For example, studies of candidate genes in fasting HDL-cholesterol [113] or obesity [114116], GWAS loci in inflammatory bowel disease [117] and the GWAS-detected MTNR1B in type 2 diabetes [118] have all demonstrated rare variants with individual or combined impact on risk of common, complex diseases. Alternative models describing the genetic architecture of common diseases have been suggested. In the ‘infinitesimal model’, genetic susceptibility is composed of thousands of common variants, each with minute effects on disease risk [119]. Current findings from GWASs seem to be somewhat consistent with this model. For example, the current count of common variants regulating circulating fasting lipids is 157 [120], and estimations from the distribution of association signals in GWAS data of type 2 diabetes point to the existence of more than 400 common, low-effect loci [13]. Furthermore, for many traits, GWASs have shown that increasing sample size leads to seemingly endless discoveries of variants with minor effects [13, 45, 120], reflecting the fact that the loci detected to date are merely the highest effect sizes according to a curve of distribution.

Each of these models describes extreme scenarios concerning the genetic architecture of common diseases; however, they are not mutually exclusive. Parts of all these hypotheses may contribute to the genetic architecture of type 2 diabetes and other common diseases, raising the possibility of a complex composite model in which context-dependent variation of a range of frequencies, individually or in specific combinations, contributes to genetic susceptibility. The existence of such a composite model is supported by a recently reported simulation analysis, the investigators of which concluded that extreme models are unlikely, yet the simulation data were consistent with many models, including those in which rare variants explain either little or most of type 2 diabetes heritability [121].

In addition, influences other than main effects of SNPs may explain parts of the susceptibility for metabolic disease. Preliminary evidence supports the role of copy number variations and gene–environment interactions in obesity and type 2 diabetes, as solid findings have been reported [122124]. Furthermore, specific parent-of-origin effects have been shown for type 2 diabetes [125], and future large-scale studies may reveal gene–gene interactions. Of interest, the heritability of type 2 diabetes and obesity may have been overestimated because of failure to account for epistasis among loci, and parts of the missing heritability may be dissolved if epistatic effects exist [126].

Strategies for identifying disease-associated variants across the allele frequency spectrum

New approaches are currently being used in the search for the determinants of genetic susceptibility to metabolic diseases and other complex traits. Nucleotide sequencing by high-throughput methods, commonly referred to as next-generation sequencing [108, 127], is being extensively used in direct association studies as improved reference panels for imputation of common and low-frequency variation or to guide the content of novel genotyping arrays (Fig. 4).

Fig. 4
figure 4

Suggested study and analysis designs of genetic studies of variants across the allele frequency spectrum. The x-axis designates genetic variation across the allele frequency spectrum. (a) shows suggested study and analytical designs, (b) shows the recommended data generation strategies and technologies

GWAS in the current setting with HapMap (http://hapmap.ncbi.nlm.nih.gov/) or 1000 Genomes (www.1000genomes.org/) imputation has proved to be a powerful tool for finding common disease-associated variants, and increasing the quality of the reference panels used in imputation will increase the genomic coverage of common variants towards an upper limit. In European populations, the current versions of imputation-based association studies are effectively limited to capture variants at an allele frequency above ~1% [128, 129], yet in other populations this approach can also shed light on rare variants. As such, DeCODE Genetics has published a series of reports on the role of rare variants in complex diseases and traits in the Icelandic population based on chip genotyping in large numbers, whole genome sequencing in a subset combined with long-range phasing and genealogy-based imputation [15, 127, 130].

Low-frequency variants (MAF 0.5–5%) are probably best studied in a design similar to GWASs (i.e. single SNP association analyses) on data obtained from either genome-wide genotyping and imputation or from targeted array-based genotyping informed by sequencing studies (Fig. 4). A number of studies focused on the detection of low-frequency variants through whole exome sequencing and/or targeted genotyping have been reported [14, 131133]. Up to now, such studies have had limited success, probably as a consequence of limited sample size and statistical power. For rare variants (MAF <0.5%), a simple extension of the GWAS paradigm is probably not sufficient since single variant tests are underpowered for the detection of such variants. To circumvent the lack of statistical power of single marker tests, several collapsing or burden methods that simultaneously analyse multiple rare variants are applied [134, 135]. These methods test the cumulative effect of multiple rare variants in a genomic unit, such as a gene or pathway. So far, few whole exome or genome sequencing studies applying such methods to the investigation of rare variants in complex diseases have been published. In a recent study of the exomes of 2,000 individuals, we were unsuccessful in our aim of discovering genes harbouring rare variants associated with type 2 diabetes [136]; however, larger sample sizes are needed to make general inferences of the importance of rare variants in the genetic architecture of metabolic traits. Ongoing work in international consortia will probably lead us closer to answering this question, yet it may be expected that large numbers of novel loci will not be identified until massive sample sizes are achieved. Of importance, the best grouping of coding rare variants in burden tests is as yet undecided. Most published studies have included disruptive and missense variants below a certain frequency threshold, but including many neutral missense variants in burden tests will decrease statistical power [126]. However, dividing rare missense variants based on functionality is not a trivial task. In silico functional prediction methods are not accurate, although studies indicate that the best of these methods can be valuable in filtering rare missense variants [126]. Future integration of high-throughput functional genetic investigations may further improve studies of rare coding variants. Extending such paradigms to non-coding regions is a great challenge since the functional characterisation and prediction of non-coding variation is less advanced.

Family studies revisited in genetic studies of type 2 diabetes?

Single variant and burden analyses in large-scale exome or genome data will enable detection of rare variants, which segregate in the population. Variations that are specific to the individual or to a single family (so-called private variants) are not present in the general population at any reasonable frequency but constitute a large fraction of all variation and are very difficult to disentangle. From the lessons learned from the study of Mendelian diseases it is plausible that private or family-specific variants with a relatively high impact on disease risk exist; however, to date, the importance of such variants in genetic architecture remains elusive. If such variants cluster within specific genes they may be detectable in large-scale sequencing studies of unrelated individuals, although locus heterogeneity will severely impede detection. Alternatively, private variations may be studied by deep sequencing in extended, multi-generational families, drawing on advantages of accurate family-based imputation of variation and the fact that more observations of the rare variations can be made in large families [137]. However, since both obesity and, especially, type 2 diabetes are rather late-onset diseases, recruiting such families is challenging. While previous genome-wide linkage studies were not particularly successful in mapping disease genes, it is evident that these studies were statistically underpowered in the presence of genetic heterogeneity and that variants associated with less than a fourfold increase in the risk of disease are expected to generate inconsistent linkage results [138]. These observations make it likely that undiscovered private or family-specific variants with moderate to high impact segregate in families. Yet, the effect of such variants is extremely difficult to prove with statistical confidence.

De novo mutations – a novel risk factor in metabolic disease?

Family studies make it possible to identify de novo mutations and seek to associate them with disease (Fig. 4). To detect de novo mutations, sequencing data in trios are needed to distinguish rare family-specific mutations from de novo mutations, and association to disease can be done in extended pedigrees or in phenotypically discordant sibling pairs. A number of reports have demonstrated that de novo copy number variations and point mutations are rare but high-impact risk factors in autism spectrum disorders [139, 140]. Yet, the influence of such variants on the risk of type 2 diabetes and metabolic disease is currently unknown. However, upcoming exome or genome sequencing studies, several of which have a family-based design, may lead to discoveries in this research area. Of interest, the diversity in the mutation rate is influenced by the age of the father at conception of the child [141]. Furthermore, de novo mutations have been shown to be a frequent cause of permanent neonatal diabetes and de novo mutations in HNF1B is a cause of MODY5 [77, 142]; however, discovery of de novo risk variants is most likely primarily possible for variants, with large effects on disease risk placing them under selective pressure [126].

Summary

• GWASs have established more than 175 genetic risk variants for human type 2 diabetes, glycaemia and adiposity

• The shared genetic aetiology of type 2 diabetes and obesity discovered in GWASs is limited

• According to GWAS data, the effect of obesity-associated variants correlates with the effect of type 2 diabetes, indicating a general effect of these loci on type 2 diabetes

• The obesity-associated variants display both metabolically healthy and unhealthy secondary associations

• Ongoing studies are seeking to elucidate the impact of rare variants in the pathogenesis of type 2 diabetes and obesity

Future directions

There have recently been dramatic changes to the technological and conceptual approaches used in genetic research of complex traits, leading to great advances. GWASs have been efficient in identifying common risk variants for metabolic disease, yet major efforts are still needed to gain biological knowledge from discoveries.

The search for genetic risk factors for type 2 diabetes and obesity is now targeting low-frequency and rare variations, and combining major data sets will presumably enable discoveries within the coming years, at first primarily focused on coding variation. However, we are on the path towards new technological and methodological developments, which will allow for large-scale genome sequencing [143]. Such developments will take genome sequencing to the population scale, possibly leading to a new understanding of the composite genetic architecture of human complex traits, integrating diverse kinds of genetic variation. For example, the study of copy number variations has led to important findings of association with severe obesity [122]. Furthermore, systemic integration of complex data obtained from other ‘omics’ techniques such as transcriptomics, proteomics and metabolomics and modelling of the combined composite impact of common metabolic phenotypes is projected to lead to breakthroughs in understanding the genetic determinants of metabolic traits.

As discussed above, common genetic variants unanimously impose modest risk increments on type 2 diabetes and adiposity. Furthermore, combining these variants does not enable prediction of type 2 diabetes [144, 145] or obesity [45, 146]. Discovered variants explain a modest part of the heritability of metabolic diseases and future studies may reveal further important genetic susceptibility elements. Besides gaining biological knowledge and allowing the identification of at-risk individuals, hopes have been high that a knowledge of genetic risk factors would lead to personalised treatment based on the genetic profile. While detailed, sufficiently statistically powered studies of the genetic influence on treatment outcomes are still lacking, lessons from monogenic metabolic disease suggest that the identification of genetically homogenous groups may lead to improvements in individualised treatment [147]. In terms of the extent to which treatment can be individualised based on genetic information, it is possible that this will range from monogenic subsets, for whom highly individualised treatment can be provided, to genetic identification of pathophysiological specific subgroups of patients for whom stratified treatment can be given, and, finally, to a highly heterogeneous patient group, for which genetic profile knowledge will not add significantly to clinical care. Yet, with the ever-falling costs of genome sequencing, we will reach a situation in the future where all patients have their full genome sequenced, thereby allowing large-scale and more accurate studies of the genetic impact on treatment outcome in different patient strata.