Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The genetic architecture of type 2 diabetes

Abstract

The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a major role in predisposition to type 2 diabetes.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Ascertainment of variants and single-variant results.
Figure 2: Association between T2D and variants in genes for Mendelian forms of diabetes.
Figure 3: Empirical T2D association results compared to results under different simulated disease models.

Similar content being viewed by others

Accession codes

Data deposits

Whole-genome sequence data from the GoT2D project are available by application to the European Genome-Phenome Archive (https://www.ebi.ac.uk/ega/home) under accession number EGAS00001001459 and from dbGAP (http://www.ncbi.nlm.nih.gov/gap) under accession number phs000840.v1.p1. Whole-exome sequence data from the T2D-GENES project are available from the European Genome-Phenome Archive (https://www.ebi.ac.uk/ega/home) under accession number EGAS00001001460 and from dbGAP (http://www.ncbi.nlm.nih.gov/gap) under accession numbers phs000847.v1.p1, phs001093.v1.p1, phs001095.v1.p1, phs001096.v1.p1, phs001097.v1.p1, phs001098.v1.p1, phs001099.v1.p1, phs001100.v1.p1 and phs001102.v1.p1. Summary-level data from the exome array component of this project (and from the exome and genome sequences) can be freely accessed at the Accelerating Medicines Partnership T2D portal (http://www.type2diabetesgenetics.org), and similar data from the GoT2D-imputed data at http://www.diagram-consortium.org.

References

  1. Willemsen, G. et al. The concordance and heritability of type 2 diabetes in 34,166 twin pairs from international twin registers: the discordant twin (DISCOTWIN) consortium. Twin Res. Hum. Genet. 18, 762–771 (2015)

    Article  PubMed  Google Scholar 

  2. Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Mahajan, A. et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46, 234–244 (2014)

    Article  CAS  PubMed  Google Scholar 

  4. Voight, B. F. et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat. Genet. 42, 579–589 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Kooner, J. S. et al. Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci. Nat. Genet. 43, 984–989 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Cho, Y. S. et al. Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians. Nat. Genet. 44, 67–72 (2011)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Steinthorsdottir, V. et al. Identification of low-frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Nat. Genet. 46, 294–298 (2014)

    Article  CAS  PubMed  Google Scholar 

  8. Ma, R. C. et al. Genome-wide association study in a Chinese population identifies a susceptibility locus for type 2 diabetes at 7q32 near PAX4 . Diabetologia 56, 1291–1305 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Huyghe, J. R. et al. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat. Genet. 45, 197–201 (2013)

    Article  CAS  PubMed  Google Scholar 

  10. Gaulton, K. J. et al. Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci. Nat. Genet. 47, 1415–1425 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  12. Lohmueller, K. E. et al. Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. Am. J. Hum. Genet. 93, 1072–1086 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Albrechtsen, A. et al. Exome sequencing-driven discovery of coding polymorphisms associated with common metabolic phenotypes. Diabetologia 56, 298–310 (2013)

    Article  CAS  PubMed  Google Scholar 

  14. Claussnitzer, M. et al. Leveraging cross-species transcription factor binding site patterns: from diabetes risk loci to disease mechanisms. Cell 156, 343–358 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Lee, S., Teslovich, T. M., Boehnke, M. & Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Collombat, P. et al. Opposing actions of Arx and Pax4 in endocrine pancreas development. Genes Dev. 17, 2591–2603 (2003)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Kooptiwut, S. et al. Defective PAX4 R192H transcriptional repressor activities associated with maturity onset diabetes of the young and early onset-age of type 2 diabetes. J. Diabetes Complications 26, 343–347 (2012)

    Article  PubMed  Google Scholar 

  19. Langenberg, C. et al. Design and cohort description of the InterAct Project: an examination of the interaction of genetic and lifestyle factors on the incidence of type 2 diabetes in the EPIC Study. Diabetologia 54, 2272–2282 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Oppelt, A. et al. Production of phosphatidylinositol 5-phosphate via PIKfyve and MTMR3 regulates cell migration. EMBO Rep. 14, 57–64 (2013)

    Article  CAS  PubMed  Google Scholar 

  21. Kozlitina, J. et al. Exome-wide association study identifies a TM6SF2 variant that confers susceptibility to nonalcoholic fatty liver disease. Nat. Genet. 46, 352–356 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Mahdessian, H. et al. TM6SF2 is a regulator of liver fat metabolism influencing triglyceride secretion and hepatic lipid droplet content. Proc. Natl Acad. Sci. USA 111, 8913–8918 (2014)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  23. Thiagalingam, A., Lengauer, C., Baylin, S. B. & Nelkin, B. D. RREB1, a ras responsive element binding protein, maps to human chromosome 6p25. Genomics 45, 630–632 (1997)

    Article  CAS  PubMed  Google Scholar 

  24. Murphy, R., Ellard, S. & Hattersley, A. T. Clinical implications of a molecular genetic classification of monogenic β-cell diabetes. Nat. Clin. Pract. Endocrinol. Metab. 4, 200–213 (2008)

    Article  CAS  PubMed  Google Scholar 

  25. Dickson, S. P., Wang, K., Krantz, I., Hakonarson, H. & Goldstein, D. B. Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Anderson, C. A., Soranzo, N., Zeggini, E. & Barrett, J. C. Synthetic associations are unlikely to account for many common disease genome-wide association signals. PLoS Biol. 9, e1000580 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Wray, N. R., Purcell, S. M. & Visscher, P. M. Synthetic associations created by rare variants do not explain most GWAS results. PLoS Biol. 9, e1000579 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sim, X. et al. Transferability of type 2 diabetes implicated loci in multi-ethnic cohorts from Southeast Asia. PLoS Genet. 7, e1001363 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Goldstein, D. B. The importance of synthetic associations will only be resolved empirically. PLoS Biol. 9, e1001008 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

  33. Mikkelsen, T. S. et al. Comparative epigenomic analysis of murine and human adipogenesis. Cell 143, 156–169 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Parker, S. C. et al. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc. Natl Acad. Sci. USA 110, 17921–17926 (2013)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  35. Pasquali, L. et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat. Genet. 46, 136–143 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Gaulton, K. J. et al. A map of open chromatin in human pancreatic islets. Nat. Genet. 42, 255–259 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  38. Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Falconer, D. S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29, 51–76 (1965)

    Article  Google Scholar 

  40. Agarwala, V., Flannick, J. & Sunyaev, S., GoT2D Consortium & Altshuler, D. Evaluating empirical bounds on complex disease genetic architecture. Nat. Genet. 45, 1418–1427 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. McClellan, J. & King, M. C. Genetic heterogeneity in human disease. Cell 141, 210–217 (2010)

    Article  CAS  PubMed  Google Scholar 

  42. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Flannick, J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat. Genet. 46, 357–363 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Bonnefond, A. et al. Rare MTNR1B variants impairing melatonin receptor 1B function contribute to type 2 diabetes. Nat. Genet. 44, 297–301 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Sigma Type 2 Diabetes Consortium et al. Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico. Nature 506, 97–101 (2014)

  46. Moltke, I. et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014)

    Article  ADS  CAS  PubMed  Google Scholar 

  47. Sigma Type 2 Diabetes Consortium et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. JAMA 311, 2305–2314 (2014)

  48. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014)

    Article  ADS  CAS  PubMed  Google Scholar 

  49. Majithia, A. R. et al. Rare variants in PPARG with decreased activity in adipocyte differentiation are associated with increased risk of type 2 diabetes. Proc. Natl Acad. Sci. USA 111, 13127–13132 (2014)

    Article  ADS  CAS  PubMed  Google Scholar 

  50. Guey, L. T. et al. Power in the phenotypic extremes: a simulation study of power in discovery and replication of rare variants. Genet. Epidemiol. 35, 236–246 (2011)

    PubMed  Google Scholar 

  51. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839–848 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

    Article  ADS  PubMed  CAS  Google Scholar 

  56. Handsaker, R. E., Korn, J. M., Nemesh, J. & McCarroll, S. A. Discovery and genotyping of genome structural polymorphism by sequencing on a population scale. Nat. Genet. 43, 269–276 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Li, Y., Sidore, C., Kang, H. M., Boehnke, M. & Abecasis, G. R. Low-coverage sequencing: implications for design of complex trait association studies. Genome Res. 21, 940–951 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135, author reply 135–139 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Weale, M. E. Quality control for genome-wide association studies. Methods Mol. Biol. 628, 341–372 (2010)

    Article  CAS  PubMed  Google Scholar 

  61. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)

  62. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006)

    Article  CAS  PubMed  Google Scholar 

  63. Fuchsberger, C., Abecasis, G. R. & Hinds, D. A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015)

    Article  CAS  PubMed  Google Scholar 

  64. Firth, D. Bias reduction of maximum-likelihood-estimates. Biometrika 80, 27–38 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  65. Ma, C., Blackwell, T., Boehnke, M. & Scott, L. J. Recommended joint and meta-analysis strategies for case-control association testing of single low-count variants. Genet. Epidemiol. 37, 539–550 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  66. Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  67. Seldin, M. F., Pasaniuc, B. & Price, A. L. New approaches to disease mapping in admixed populations. Nat. Rev. Genet. 12, 523–528 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Price, A. L. et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5, e1000519 (2009)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  69. Churchhouse, C. & Marchini, J. Multiway admixture deconvolution using phased or unphased ancestral panels. Genet. Epidemiol. 37, 1–12 (2013)

    Article  PubMed  Google Scholar 

  70. Purcell, S. M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  71. Lee, S., Wu, M. C. & Lin, X. Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13, 762–775 (2012)

    Article  PubMed  PubMed Central  Google Scholar 

  72. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007)

    Article  CAS  PubMed  Google Scholar 

  73. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999)

    Article  CAS  PubMed  MATH  Google Scholar 

  74. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  76. Korn, J. M. et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat. Genet. 40, 1253–1260 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Rice, W. R. A consensus combined P-value test and the family-wide significance of component tests. Biometrics 46, 303–308 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  78. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  83. Lage, K. et al. A human phenome–interactome network of protein complexes implicated in genetic disorders. Nat. Biotechnol. 25, 309–316 (2007)

    Article  CAS  PubMed  Google Scholar 

  84. Nepusz, T., Yu, H. & Paccanaro, A. Detecting overlapping protein complexes in protein–protein interaction networks. Nat. Methods 9, 471–472 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Jia, P., Zheng, S., Long, J., Zheng, W. & Zhao, Z. dmGWAS: dense module searching for genome-wide association studies in protein–protein interaction networks. Bioinformatics 27, 95–102 (2011)

    Article  CAS  PubMed  Google Scholar 

  86. Lambert, B. W., Terwilliger, J. D. & Weiss, K. M. ForSim: a tool for exploring the genetic architecture of complex traits with controlled truth. Bioinformatics 24, 1821–1822 (2008)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Eyre-Walker, A. Evolution in health and medicine Sackler colloquium: Genetic architecture of a complex trait and its implications for fitness and genome-wide association studies. Proc. Natl Acad. Sci. USA 107 (Suppl 1), 1752–1756 (2010)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  88. Lyssenko, V. et al. Clinical risk factors, DNA variants, and the development of type 2 diabetes. N. Engl. J. Med. 359, 2220–2232 (2008)

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

Grant support and acknowledgments are listed in the Supplementary Information.

Author information

Authors and Affiliations

Authors

Contributions

Author contributions are described in the Supplementary Information.

Corresponding authors

Correspondence to Michael Boehnke or Mark I. McCarthy.

Ethics declarations

Competing interests

R.A.D. has been a member of advisory boards for Astra Zeneca, Novo Nordisk, Janssen, Lexicon and Boehringer-Ingelheim; received research support from Bristol Myers Squibb, Boehringer- Ingelheim, Takeda and Astra Zeneca; and is a member of speakers’ bureaus for Novo-Nordisk and Astra Zeneca. J.C.F. has received consulting honoraria from Pfizer and PanGenX. M.I.M. has received consulting and advisory board honoraria from Pfizer, Lilly, and NovoNordisk. G.M. and P.D. are co-founders of Genomics PLC, which provides genome analytics. D.A. is an employee of and holds equity in Vertex Pharmaceuticals.

Extended data figures and tables

Extended Data Figure 1 Summary of samples and quality control procedures.

This figure summarizes data generation for whole-genome sequencing (GoT2D), exome sequencing (GoT2D and T2D-GENES), exome array genotyping (DIAGRAM), and GWAS imputation (DIAGRAM).

Extended Data Figure 2 Power for single and aggregate variant association.

a–g, Power to detect single-variant association (α = 5 × 10−8) at varying minor allele frequencies (x-axis) and allelic ORs (y-axis) for seven effective sample size (Neff) scenarios relevant to the genomes (ac) and exomes (dg) components of this project. a, Variant observed in 2,657 samples (the effective size of the GoT2D integrated panel). b, Variant observed in 28,350 samples (the effective size of the imputed data set). c, Variant observed in the GoT2D integrated panel and the imputed data set (effective sample size 31,007). d, Ancestry-specific variant in 2,000 samples (the size of each of the non-European exome sequence data sets). e, European-specific variant in 5,000 samples (the combined size of the European exome sequence data sets). f, Variant observed with shared frequency across all ancestry groups in 12,940 samples (the size of the combined exome sequence data set). g, Variant observed in the combined exome array and sequencing data set (effective sample size 82,758). h, i, Power for gene-based test of association (SKAT-O) according to liability variance explained. In h, 50% of the variants contribute to disease risk and the remaining 50% have no effect on disease risk; in i, 100% of the variants contribute to disease risk. For each, sample sizes considered are 2,000 (ancestry-specific effects; green) and 12,940 (ancestry-shared effects; blue). Power is shown for two levels of significance (α = 2.5 × 10−6 and α = 0.001). From these simulation studies, it is clear that under the optimistic model, where effects are shared across all ethnicities (blue line) and all variants contribute, power is >60% for 1% variance explained and α = 2.5 × 10−6. However, power declines rapidly if either criterion is relaxed.

Extended Data Figure 3 Single variant analyses.

ac, Manhattan plot of single-variant analyses generated from exome sequence data in 6,504 cases and 6,436 controls of African American, East Asian, European, Hispanic, and South Asian ancestry (a); exome array genotypes in 28,305 cases and 51,549 controls of European ancestry (b); and combined meta-analysis of exome array and exome sequence samples (c). Coding variants are categorized according to their relationships to the previously reported lead variant from GWAS region. Loci achieving genome-wide significance only in the combined analysis are highlighted in bold. The HNF1A variant reaching genome-wide significance in the combined analysis is a synonymous variant (Thr515Thr). The dashed horizontal line in each panel designates the threshold for genome-wide significance (P < 5 × 10−8).

Extended Data Figure 4 Classification of coding variants according to their relationship to reported lead variants for each GWAS region.

The ideogram shows the location of 25 coding variant associations at 16 loci described in the text. The number in each circle corresponds to the number of associated variants at each locus. Variants are grouped into five categories based on inferred relationship with the GWAS lead variant. For some of these categories, the figure includes representative regional association plots based on exome array meta-analysis data from 28,305 cases and 51,549 controls. The locus displayed for each category is designated in bold. The first plot in each panel shows the unconditional association results; the middle plot the association results after conditioning on the non-coding GWAS SNP; and the last plot the results after conditioning on the most significantly associated coding variant. Each point represents an SNP in the exome array meta-analysis, plotted with its P value (on a –log10 scale) as a function of the genomic position (hg19). In each panel, the lead coding variant is represented by the purple symbol. The colour-coding of all other SNPs indicates LD with the lead SNP (estimated by European r2 from 1000G March 2012 reference panel: red r2 ≥ 0.8; gold 0.6 ≤ r2 < 0.8; green 0.4 ≤ r2 < 0.6; cyan 0.2 ≤ r2 < 0.4; blue r2 < 0.2; grey r2 unknown). Gene annotations are taken from the University of California Santa Cruz genome browser. GWS: genome-wide significance. *Seven variants, three at ASCC2, and one each at THADA, TSPAN8, FES and HNF4A did not achieve genome-wide significance themselves, but are included because they fall into genes and/or regions with other significant association signals (see text).

Extended Data Figure 5 Exclusion of synthetic associations and construction of credible causal variant sets at T2D GWAS loci.

Ten T2D GWAS loci were selected for synthetic association testing (P < 0.001; see Methods). a, The effect size observed at the GWAS index SNV (sequence data) before (navy blue) and after (light blue, grey) conditioning on candidate rare and low-frequency (MAF <5%) variants which could produce synthetic association. b, Example of synthetic association exclusion at the TCF7L2 locus. Error bars represent 95% confidence intervals for the index SNP odds ratio as rare variants are greedily added to the model. c, The size of credible sets at T2D GWAS loci when constructed from the GoT2D data, compared to the sizes when restricted to variants in the 1000G or HapMap data.

Extended Data Figure 6 Genome enrichment analysis in GoT2D whole genome sequence data.

n = 2,657. a, Functional annotation categories were defined using transcription, chromatin state and transcription factor binding data from GENCODE, ENCODE and other studies. b, T2D association statistics for variants at each T2D locus were jointly modelled with functional annotation using fgwas. In the resulting model we identified enrichment of coding exons (CDS), transcription factor binding sites (TFBS), mature adipose active enhancers and promoters (hASC-t4 EnhA, TssA), pancreatic islet active and weak enhancers (HI EnhA, EnhWk), pre-adipose active and weak enhancers (hASC-t1 EnhA, EnhWk), embryonic stem cell active promoters (H1-hESC TssA) and 5′UTRs. Dots represent enrichment estimates and horizontal lines the 95% confidence intervals. c, At the CCND2 locus, three variants not present in HapMap2 have a combined 90% posterior probability of being causal (rs4238013, rs3217801, rs73040004). One of these variants, rs3217801, is a 2-bp indel that overlaps an islet enhancer element.

Extended Data Figure 7 Low frequency variants in exome array data.

Results from meta-analysis of 43,045 low-frequency and common coding variants on the exome array (assayed in 79,854 European subjects). a, Observed allelic ORs as a property of allele MAF. Variants missing in more than eight cohorts or polymorphic in only one cohort were excluded. Coloured lines represent contours for liability variance explained. Regions shaded grey denote ranges of OR and MAF consistent with 80% power (in this case, at α = 5 × 10−7) to detect single-variant associations in this data set (given the observed range of missing data). Variants with a black collar are those highlighted by a bounding analysis as having a probability >0.8 of having liability-scale variance (LVE) > 0.1%. b, Distribution of each variant in the MAF/OR space was computed by assuming T2D prevalence of 8% and a beta and normal distribution for MAF and OR, respectively. Probability is obtained by integrating the joint MAF–OR distributions over ranges of LVE. c, Single variant association, liability and bounding results for the known T2D GWAS variants on the exome array (see Methods).

Extended Data Table 1 Summary information for sample sets used in the association analyses
Extended Data Table 2 Counts and properties of variants identified in sequenced subjects
Extended Data Table 3 Characterization of variant associations through conditional analysis
Extended Data Table 4 Testing for synthetic associations across GWAS-identified T2D loci

Supplementary information

Supplementary Information

This file contains Supplementary Tables and Figures 1– 32 (see separate excel file for Supplementary Table 20) and Author contribution and acknowledgement lists. (PDF 23107 kb)

Supplementary Table 20

This file contains an Overview of 634 genes at 81 GWAS-identified T2D loci. (XLSX 77 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fuchsberger, C., Flannick, J., Teslovich, T. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016). https://doi.org/10.1038/nature18642

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature18642

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing