Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Genetic analyses of diverse populations improves discovery for complex traits

Abstract

Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1,2,3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4,5,6,7,8,9,10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States—where minority populations have a disproportionately higher burden of chronic conditions13—the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Inclusion of multi-ethnic samples enables discovery and replication in GWAS.
Fig. 2: Weaker effect sizes of previously published trait–variant associations in non-European populations exacerbates disparity in PVE.
Fig. 3: Fine-mapping with multi-ethnic PAGE versus homogeneous UK Biobank samples for height.

Similar content being viewed by others

Data availability

Individual-level phenotype and genotype data are available through dbGaP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000356). Allele frequency data will be available for all genotyped sites on dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP/) and the University of Chicago Geography of Genetic Variants Browser (http://popgen.uchicago.edu/ggv/). Clinically relevant variant frequency data are available through ClinGen (https://curation.clinicalgenome.org/). Summary statistics for the genome-wide association study results are available through the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics).

References

  1. Need, A. C. & Goldstein, D. B. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 25, 489–494 (2009).

    Google Scholar 

  2. Bustamante, C. D., Burchard, E. G. & De La Vega, F. M. Genomics for the world. Nature 475, 163–165 (2011).

    Google Scholar 

  3. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature 538, 161–164 (2016).

    Google Scholar 

  4. Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 11983–11988 (2011).

    Google Scholar 

  5. The SIGMA Type 2 Diabetes Consortium. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. J. Am. Med. Assoc. 311, 2305–2314 (2014).

    Google Scholar 

  6. Gudmundsson, J. et al. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat. Genet. 44, 1326–1329 (2012).

    Google Scholar 

  7. Moltke, I. et al. A common Greenlandic TBC1D4 variant confers muscle insulin resistance and type 2 diabetes. Nature 512, 190–193 (2014).

    Google Scholar 

  8. Kenny, E. E. et al. Melanesian blond hair is caused by an amino acid change in TYRP1. Science 336, 554 (2012).

    Google Scholar 

  9. Manning, A. et al. A low-frequency inactivating AKT2 variant enriched in the Finnish population is associated with fasting insulin levels and type 2 diabetes risk. Diabetes 66, 2019–2032 (2017).

    Google Scholar 

  10. Han, Y. et al. Prostate cancer susceptibility in men of African ancestry at 8q24. J. Natl Cancer Inst. 108, djv431 (2016).

    Google Scholar 

  11. Carlson, C. S. et al. Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study. PLoS Biol. 11, e1001661 (2013).

    Google Scholar 

  12. Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    Google Scholar 

  13. Liao, Y. et al. Surveillance of health status in minority communities — racial and ethnic approaches to community health across the U.S. (REACH U.S.) risk factor survey, United States, 2009. MMWR Surveill. Summ. 60, 1–44 (2011).

    Google Scholar 

  14. Wojcik, G. L. et al. Imputation-aware tag SNP selection to improve power for large-scale, multi-ethnic association studies. G3 (Bethesda) 8, 3255–3267 (2018).

    Google Scholar 

  15. Rosenberg, N. A. et al. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet. 1, e70 (2005).

    Google Scholar 

  16. Conomos, M. P. et al. Genetic diversity and association studies in US Hispanic/Latino populations: applications in the Hispanic community health study/study of Latinos. Am. J. Hum. Genet. 98, 165–184 (2016).

    Google Scholar 

  17. Conomos, M. P., Miller, M. B. & Thornton, T. A. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet. Epidemiol. 39, 276–293 (2015).

    Google Scholar 

  18. Conomos, M. P., Reiner, A. P., Weir, B. S. & Thornton, T. A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98, 127–148 (2016).

    Google Scholar 

  19. Lin, D.-Y. et al. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am. J. Hum. Genet. 95, 675–688 (2014).

    Google Scholar 

  20. Lin, D. Y. & Zeng, D. On the relative efficiency of using summary statistics versus individual-level data in meta-analysis. Biometrika 97, 321–332 (2010).

    Google Scholar 

  21. Fadista, J., Manning, A. K., Florez, J. C. & Groop, L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur. J. Hum. Genet. 24, 1202–1205 (2016).

    Google Scholar 

  22. MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    Google Scholar 

  23. Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    Google Scholar 

  24. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    Google Scholar 

  25. Shim, H. et al. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PLoS ONE 10, e0120758 (2015).

    Google Scholar 

  26. Bien, S. A. et al. Strategies for enriching variant coverage in candidate disease loci on a multiethnic genotyping array. PLoS ONE 11, e0167758 (2016).

    Google Scholar 

  27. Lacy, M. E. et al. Association of sickle cell trait with hemoglobin A1c in African americans. J. Am. Med. Assoc. 317, 507–515 (2017).

    Google Scholar 

  28. Lin, C.-N. et al. Effects of hemoglobin C, D, E, and S traits on measurements of HbA1c by six methods. Clin. Chim. Acta 413, 819–821 (2012).

    Google Scholar 

  29. Mongia, S. K. et al. Effects of hemoglobin C and S traits on the results of 14 commercial glycated hemoglobin assays. Am. J. Clin. Pathol. 130, 136–140 (2008).

    Google Scholar 

  30. Roberts, W. L. et al. Effects of hemoglobin C and S traits on glycohemoglobin measurements by eleven methods. Clin. Chem. 51, 776–778 (2005).

    Google Scholar 

  31. Henn, B. M. et al. Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl Acad. Sci. USA 108, 5154–5162 (2011).

    Google Scholar 

  32. Baker, J. L., Shriner, D., Bentley, A. R. & Rotimi, C. N. Pharmacogenomic implications of the evolutionary history of infectious diseases in Africa. Pharmacogenomics J. 17, 112–120 (2017).

    Google Scholar 

  33. Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).

    Google Scholar 

  34. Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).

    Google Scholar 

  35. Collins, F. S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).

    Google Scholar 

  36. Colby, S. L. & Ortman, J. M. Projections of the Size and Composition of the U.S. Population: 2014 to 2060 (United States Census Bureau, 2015).

  37. United Nations Population Fund. State of World Population 2016. http://www.unfpa.org/swop (2016).

  38. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Google Scholar 

  39. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  40. Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nat. Methods 9, 179–181 (2012).

    Google Scholar 

  41. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009).

    Google Scholar 

  42. Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).

    Google Scholar 

  43. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Google Scholar 

  44. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Google Scholar 

Download references

Acknowledgements

The Population Architecture Using Genomics and Epidemiology (PAGE) program is funded by the National Human Genome Research Institute (NHGRI) with co-funding from the National Institute on Minority Health and Health Disparities (NIMHD). The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health (NIH). The PAGE consortium thanks the staff and participants of all PAGE studies for their contributions. We thank R. Williams and M. Ginoza for providing assistance with program coordination. The complete list of PAGE members can be found at http://www.pagestudy.org. Assistance with data management, data integration, data dissemination, genotype imputation, ancestry deconvolution, population genetics, analysis pipelines and general study coordination was provided by the PAGE Coordinating Center (NIH U01HG007419). Genotyping services were provided by the Center for Inherited Disease Research (CIDR). The CIDR is fully funded through a federal contract from the NIH to The Johns Hopkins University, contract number HHSN268201200008I. Genotype data quality control and quality assurance services were provided by the Genetic Analysis Center in the Biostatistics Department of the University of Washington, through support provided by the CIDR contract. The data and materials included in this report result from collaboration between the following studies and organizations: BioMe Biobank, HCHS/SOL, MEC, PAGE Global Reference Panel and WHI. Their funding is listed below and additional acknowledgements can be found in Supplementary Information 12. The BioMe Biobank received funding for the PAGE IPM BioMe Biobank study through the National Human Genome Research Institute (NIH U01HG007417). Primary funding support to K.E.N., M.G., R.T., H.M.H., C.L.A., C.J.H., A.E.J., B.M.L., M.A.R., K.L.Y., E.B., L.F., M.F., G.H., D.L., C.L.W. and S.Y. (as part of HCHS/SOL) is provided by U01HG007416. Additional support was provided via R01DK101855 and 15GRNT25880008. The HCHS/SOL study was carried out as a collaborative study supported by contracts from the National Heart, Lung and Blood Institute (NHLBI) to the University of North Carolina (N01-HC65233), University of Miami (N01-HC65234), Albert Einstein College of Medicine (N01-HC65235), Northwestern University (N01-HC65236) and San Diego State University (N01-HC65237). The Multiethnic Cohort study (MEC) characterization of epidemiological architecture is funded through the NHGRI PAGE program (NIH U01 HG007397). The MEC study is funded through the National Cancer Institute U01 CA164973. The Stanford Global Reference Panel was created by Stanford-contributed samples and comprises multiple datasets from multiple researchers across the world designed to provide a resource for any researchers interested in diverse population data on the Multi-Ethnic Global Array (MEGA), funded by the NHGRI PAGE program (NIH U01HG007419). The authors thank the researchers and research participants who made this dataset available to the community. Funding support for the ‘Exonic variants and their relation to complex traits in minorities of the WHI’ study is provided through the NHGRI PAGE program (NIH U01HG007376). The WHI program is funded by the NHLBI, NIH, US Department of Health and Human Services through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C and HHSN271201100004C. K.K.N. was supported by the Cancer Prevention Training Grant in Nutrition, Exercise and Genetics R25CA094880 from the National Cancer Institute. C.R.G. was supported by NHGRI training grant T32 HG000044. H.M.H. was supported by NHLBI training grant T32 HL007055. A.E.J. was supported by NIH 5K99HL130580-02 and NIH L60 MD008384-02. K.L.Y. supported by NCATS KL2TR001109. J.M.K. was supported by KL2TR000421. R.W.W. was supported by NIH 5T32HD049311-07. D.-Y.L. was supported by R01CA082659, R01GM047845 and P01CA142538. L.F.-R. was supported by NICHD training grant T32 HD007168 and P2C HD050924. T.A.T. was supported by P01GM099568.

Reviewer information

Nature thanks André G. Uitterlinden and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Contributions

L.A.H., S.B., C.A.H., C.K., L.L.M., R.J.F.L., T.C.M., K.E.N., U.P., E.E.K. and C.S.C. provided overall project supervision and management. G.L.W., J.H., C.R.G., N.Z., S.B., J.M.K., E.P.S., K.V., G.M.B., R.W.W., C. Schurmann, A.S., A.M.-E., C.A.W., E.P.B., S.C.-Q., V.A.-A., S.A.B., M.H.P., M.F., C.D.B., L.C.P., J.R., K.D., M.P.C., X.S., C.A.L., C.C.L., R.D., G.N., E.B., S.C.N., C.K., U.P., E.E.K. and C.S.C. carried out genotyping experiments and quality control. M.G., K.K.N., J.H., H.M.H., Y.M.P., A.E.J., C.J.H., C.L.W., C.L.A., K.L.Y., M.A.R., N.Z., S.B., J.M.K., I.C., V.W.S., G.M.B., C. Schurmann, A.V., M.H.P., G.H., L.F.-R., M.F., A.P.R., L.R.W., R.D.J., S.Y., U.L., Y.H., Y. Lu, S.-S.L.P., C.C., R.D., G.N., E.B., S.B., C.K., L.L.M., U.P. and E.E.K. carried out phenotype-harmonization studies. G.L.W., M.G., K.K.N., R.T., J.H., C.R.G., H.M.H., Y.M.P., A.E.J., B.M.L., C.J.H., C.L.W., C.L.A., K.L.Y., M.A.R., S.B., J.M.K., I.C., V.W.S., E.P.S., G.M.B., M.V., R.D.J., S.Y., U.L., Y.H., S.A.B., C. Sabatti, L.M.H., P.J.N., S.C., Y. Lu, D.-Y.L., T.A.T., J.L.A., D.O.S., Y. Li, S.-S.L.P., C.K., U.P., E.E.K. and C.S.C. carried out association analyses. G.L.W., M.G., K.K.N., R.T., J.H., C.R.G., H.M.H., Y.M.P., A.E.J., B.M.L., C.J.H., C.L.W., C.L.A., K.L.Y., M.A.R., J.M.K., I.C., V.W.S., E.P.S., R.W.W., A.V., Y.H., S.A.B., P.J.N., S.C., L.M.H., D.-Y.L., G.H., A.P.R., T.A.T., D.O.S., L.A.H., R.D., G.N., E.A.S., S.B., C.A.H., C.K., L.L.M., R.J.F.L., T.C.M., K.E.N., U.P., E.E.K. and C.S.C. prepared the manuscript.

Corresponding authors

Correspondence to Eimear E. Kenny or Christopher S. Carlson.

Ethics declarations

Competing interests

C.D.B. is a member of the scientific advisory boards for Liberty Biosecurity, Personalis, 23andMe Roots into the Future, Ancestry.com, IdentifyGenomics and Etalon, and is a founder of CDB Consulting. C.R.G. and B.M.H. own stock in 23andMe. E.E.K. and C.R.G. are members of the scientific advisory board for Encompass Bioscience. E.E.K. consults for Illumina.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Number of unique participants in the GWAS Catalog from 2006 to 2017 (inclusive).

We observed that—although the number of unique participants (in millions) in the GWAS Catalog has grown substantially over the past decade—the relative proportion of participants of non-European descent has remained constant, with the majority of progress within Asian populations.

Extended Data Fig. 2 Correlation between SNP genotype and PC1–PC10.

a, The correlation (r2) for novel and residual loci calculated by obtaining the individual level data for all PAGE participants and correlating the SNP genotype with each of the ten PCs. The correlation between each locus and each of the ten PCs was plotted on the y axis, novel loci are plotted in grey and residual loci are plotted in yellow. We observed an especially high correlation between a novel locus and PC4, which represents Native Hawaiian/Pacific Islander ancestry. b, The individual level data for all PAGE participants were obtained and plotted in a parallel coordinates plot, such that each PAGE individual is represented by a set of line segments connecting their eigenvalues. This allows us to see which race/ethnicity groups are differentiated at each PC. For example, we see predominantly green lines as outliers for PC4, which indicates that this vector represents a continuum of Native Hawaiian/Pacific Islander ancestry.

Extended Data Table 1 GWAS Catalog heterogeneity by trait, including number of novel and secondary findings
Extended Data Table 2 Results of the meta-analysis

Supplementary information

Supplementary Information

This file includes detailed descriptions of PAGE participating studies, phenotype harmonization, genotyping and imputation, population substructure characterization, the comparison of meta- and mega-analyses, extended statistical methods, and characterization of clinically-relevant variants, with 17 Supplementary Figures. Acknowledgements not included in the main text are also listed.

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-7: Supplementary Table 1 Phenotypes in PAGE, both combined and stratified by self-identified race/ethnicity; Supplementary Table 2 Results from SUGEN and GENESIS of novel and secondary loci reaching genome-wide significance across all 26 traits; Supplementary Table 3 Results from SUGEN stratified by self-identified race/ethnicity and combined in fixed-effects meta-analysis for all novel and secondary loci across all 26 traits; Supplementary Table 4 Results from SUGEN and GENESIS for all previously reported loci in the combined sample (mega-analysis) for each continuous trait; Supplementary Table 5 All known variants with reference information for the indicated traits. This includes rsID, PubmedID, citation, sample descriptors (both discovery and replication), and reported gene; Supplementary Table 6 Bibliography and study descriptors for the largest published manuscript by trait in the NHGRI-EBI GWAS Catalog; Supplementary Table 7 Comparison of effect sizes (both as-published and standardized for sample size) of previously reported trait-loci associations between in the NHGRI-EBI GWAS Catalog and PAGE GWAS results.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wojcik, G.L., Graff, M., Nishimura, K.K. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019). https://doi.org/10.1038/s41586-019-1310-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-019-1310-4

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing