Abstract
Characterizing genetic influences on DNA methylation (DNAm) provides an opportunity to understand mechanisms underpinning gene regulation and disease. In the present study, we describe results of DNAm quantitative trait locus (mQTL) analyses on 32,851 participants, identifying genetic variants associated with DNAm at 420,509 DNAm sites in blood. We present a database of >270,000 independent mQTLs, of which 8.5% comprise long-range (trans) associations. Identified mQTL associations explain 15–17% of the additive genetic variance of DNAm. We show that the genetic architecture of DNAm levels is highly polygenic. Using shared genetic control between distal DNAm sites, we constructed networks, identifying 405 discrete genomic communities enriched for genomic annotations and complex traits. Shared genetic variants are associated with both DNAm levels and complex diseases, but only in a minority of cases do these associations reflect causal relationships from DNAm to trait or vice versa, indicating a more complex genotype–phenotype map than previously anticipated.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
A database of our results is available as a resource to the community at http://mqtldb.godmc.org.uk. The individual-level genotype and DNAm data are available by request from each individual study or can be downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo), European Genome–Phenome Archive (EGA, https://ega-archive.org) or Array Express (https://www.ebi.ac.uk/arrayexpress). As the consent for most studies requires the data to be under managed access, the individual-level genotype and DNAm data are not available from a public repository unless stated.
ALS BATCH1 and -2 data are available to researchers by request as outlined in the Project MinE access policy. ARIES data are available to researchers by request from the Avon Longitudinal Study of Parents and Children Executive Committee (http://www.bristol.ac.uk/alspac/researchers/access) as outlined in the study’s access policy http://www.bristol.ac.uk/media-library/sites/alspac/documents/researchers/data-access/ALSPAC_Access_Policy.pdf. BAMSE data are available from the GABRIEL consortium as well as on request in EGA, under accession no. EGAC00001000786. BASICMAR DNAm data are available under accession no. GSE69138. Born-in-Bradford data are available to researchers who submit an expression of interest to the Born-in-Bradford Executive Group (https://borninbradford.nhs.uk/research). BSGS DNAm data are available under accession no. GSE56105. GOYA data are available by request from DNBC: https://www.dnbc.dk. Dunedin data are available via a managed access system (contact: ac115@duke.edu). E-Risk DNAm data are available under accession no. GSE105018. Estonian biobank (ECGUT) data can be accessed on ethical approval by submitting a data release request to the Estonian Genome Center, University of Tartu (http://www.geenivaramu.ee/en/access-biopank/data-access). EPIC-Norfolk data can be accessed by contacting the study management committee: http://www.srl.cam.ac.uk/epic/contact. Requests for EPICOR data accession may be sent to Professor Giuseppe Matullo (giuseppe.matullo@unito.it). FTC data can be accessed on approval from the Data Access Committee of the Institute for Molecular Medicine Finland FIMM (fimm-dac@helsinki.fi). Requests for Generation R data access are evaluated by the Generation R Management Team. Researchers can obtain a de-identified GLAKU dataset after having obtained an approval from the GLAKU Study Board. GSK DNAm data are available under accession no. GSE125105. INMA data are available by request from the INfancia y Medio Ambiente Executive Committee for researchers who meet the criteria for access to confidential data. IOW F2 data are available by request from Isle of Wight Third Generation Study. Please contact Mr Stephen Potter (stephen.potter@iow.nhs.uk). LLS DNAm data were submitted to the EGA under accession no. EGAS00001001077. LBC1921 and LBC1936 data are available on request from the Lothian Birth Cohort Study, Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh (I.Deary@ed.ac.uk). DNAm from MARTHA participants are available under accession no. E-MTAB-3127. NTR DNAm data are available on request in EGA, under the accession no. EGAD00010000887. PIAMA data are available on request. Requests can be submitted to the PIAMA Principal Investigators (https://piama.iras.uu.nl/english). PRECISESADS data are available through ELIXIR at https://doi.org/10.17881/th9v-xt85. Collaboration in data analysis of PREDO is possible through specific research proposals sent to the PREDO Study Board (predo.study@helsinki.fi) or primary investigators Katri Räikkönen (katri.raikkonen@helsinki.fi) or Hannele Laivuori (hannele.laivuori@helsinki.fi). Data are available on request at Project MinE (https://www.projectmine.com). Raine data are available on request (https://ross.rainestudy.org.au). Requests for the data accession of the Rotterdam Study may be sent to Frank van Rooij (f.vanrooij@erasmusmc.nl). SABRE data are available by request from SABRE (https://www.sabrestudy.org). SCZ1 DNAm data are available under accession no. GSE80417. SCZ2 DNAm data are available under accession no. GSE84727. SYS data are available on request addressed to Dr. Zdenka Pausova (zdenka.pausova@sickkids.ca) and Dr. Tomas Paus (tpausresearch@gmail.com). Further details about the protocol can be found at http://www.saguenay-youth-study.org. TwinsUK DNAm data are available in the GEO under accession nos. GSE62992 and GSE121633. TwinsUK adipose DNAm data are stored in EGA under the accession no. E-MTAB-1866. Access to additional individual-level genotype and phenotype data can be applied for through the TwinsUK data access committee: http://twinsuk.ac.uk/resources-for-researchers/access-our-data. Individual-level DNAm and genetic data from the UK Household Longitudinal Study are available on application through the EGA under accession no. EGAS00001001232. Nonidentifiable Generation Scotland data will be made available to researchers through the GS:SFHS Access Committee. MESA DNAm data are available under accession nos. GSE56046 and GSE56581. Tissue DNAm data are available from accession no. GSE78743. Brain DNAm data can be found under accession no. GSE58885.
Cohort descriptions and further contact details can be found in the Supplementary Note.
For the enrichments, we used chromatin states from the Epigenome Roadmap (https://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/imputed12marks/jointModel/final), TFBSs from the ENCODE project (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeAwgTfbsUniform) downloaded from the LOLA core database (http://databio.org/regiondb), and gene annotations from https://zwdzwd.github.io/InfiniumAnnotation or GARFIELD (https://www.ebi.ac.uk/birney-srv/GARFIELD). To extract GWA signals for co-localization, we used the MRBase database (https://www.mrbase.org).
Code availability
Datasets were processed using https://github.com/perishky/meffil unless stated otherwise. Individual study analysts used a github pipeline https://github.com/MRCIEU/godmc to conduct the mQTL analysis. We used https://github.com/MRCIEU/godmc_phase1_analysis for the phase 1 analysis, https://github.com/explodecomputer/random-metal for the meta-analyses and https://github.com/MRCIEU/godmc_phase2_analysis for the follow-up analyses.
References
Petronis, A. Epigenetics as a unifying principle in the aetiology of complex traits and diseases. Nature 465, 721–727 (2010).
van Dongen, J. et al. Genetic and environmental influences interact with age and sex in shaping the human methylome. Nat. Commun. 7, 11115 (2016).
Hannon, E. et al. Characterizing genetic and environmental influences on variable DNA methylation using monozygotic and dizygotic twins. PLoS Genet. 14, e1007544 (2018).
Kerkel, K. et al. Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation. Nat. Genet. 40, 904–908 (2008).
Schadt, E. E. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003).
Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–R98 (2014).
Gaunt, T. R. et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 17, 61 (2016).
Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017).
Hannon, E. et al. Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat. Neurosci. 19, 48–54 (2016).
Hop, P. J. et al. Genome-wide identification of genes regulating DNA methylation using genetic anchors for causal inference. Genome Biol. 21, 220 (2020).
Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Pidsley, R. et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 17, 208 (2016).
Bibikova, M. et al. High density DNA methylation array with single CpG site resolution. Genomics 98, 288–295 (2011).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Shah, S. et al. Genetic and environmental exposures constrain epigenetic drift over the human life course. Genome Res. 24, 1725–1733 (2014).
Gutierrez-Arcelus, M. et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. Elife 2, e00523 (2013).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e24 (2016).
McRae, A. F. et al. Identification of 55,000 replicated DNA methylation QTL. Sci. Rep. 8, 17605 (2018).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Wahl, S. et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 541, 81–86 (2017).
Elliott, G. et al. Intermediate DNA methylation is a conserved signature of genome regulation. Nat. Commun. 6, 6363 (2015).
Feldmann, A. et al. Transcription factor occupancy can mediate active turnover of DNA methylation at regulatory regions. PLoS Genet. 9, e1003994 (2013).
Grundberg, E. et al. Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements. Am. J. Hum. Genet. 93, 876–890 (2013).
Kim-Hellmuth, S. et al. Cell type-specific genetic regulation of gene expression across human tissues. Science 369, eaaz8528 (2020).
Qi, T. et al. Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat. Commun. 9, 2282 (2018).
Yin, Y. et al. Impact of cytosine methylation on DNA binding specificities of human transcription factors. Science 356, eaaj2239 (2017).
Domcke, S. et al. Competition between DNA methylation and transcription factors determines binding of NRF1. Nature 528, 575–579 (2015).
Baubec, T. et al. Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature 520, 243–247 (2015).
Ginno, P. A. et al. A genome-scale map of DNA methylation turnover identifies site-specific dependencies of DNMT and TET activity. Nat. Commun. 11, 2680 (2020).
Sánchez-Castillo, M. et al. CODEX: a next-generation sequencing experiment database for the haematopoietic and embryonic stem cell communities. Nucleic Acids Res. 43, D1117–D1123 (2015).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Waszak, S. M. et al. Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015).
Viny, A. D. et al. Dose-dependent role of the cohesin complex in normal and malignant hematopoiesis. J. Exp. Med. 212, 1819–1832 (2015).
Battle, A. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Kumasaka, N., Knights, A. J. & Gaffney, D. J. High-resolution genetic mapping of putative causal interactions between regions of open chromatin. Nat. Genet. 51, 128–137 (2019).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Delaneau, O. et al. Chromatin three-dimensional interactions mediate genetic effects on gene expression. Science 364, eaat8266 (2019).
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. https://doi.org/10.1038/s41588-021-00913-z (2021).
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Tachmazidou, I. et al. Whole-genome sequencing coupled to imputation discovers genetic signals for anthropometric traits. Am. J. Hum. Genet. 100, 865–884 (2017).
Kato, N. et al. Trans-ancestry genome-wide association study identifies 12 genetic loci influencing blood pressure and implicates a role for DNA methylation. Nat. Genet. 47, 1282–1293 (2015).
Iotchkova, V. et al. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals. Nat. Genet. 51, 343–353 (2019).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Reinius, L. E. et al. Differential DNA methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE 7, e41361 (2012).
Houseman, E. A. et al. Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributions. BMC Bioinform. 9, 365 (2008).
Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS Genet. 13, e1007081 (2017).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Zheng, J. et al. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat Genet. 52, 1122–1131 (2020).
Richardson, T. G. et al. Systematic Mendelian randomization framework elucidates hundreds of CpG sites which may mediate the influence of genetic variants on disease. Hum. Mol. Genet. 27, 3293–3304 (2018).
Hemani, G., Bowden, J. & Davey Smith, G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum. Mol. Genet. 27, R195–R208 (2018).
Brion, M. J., Shakhbazov, K. & Visscher, P. M. Calculating statistical power in Mendelian randomization studies. Int. J. Epidemiol. 42, 1497–1501 (2013).
Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).
Hemani, G. et al. The MR-base platform supports systematic causal inference across the human phenome. Elife 7, e34408 (2018).
Dekkers, K. F. et al. Blood lipids influence DNA methylation in circulating cells. Genome Biol. 17, 138 (2016).
Braun, K. V. E. et al. Epigenome-wide association study (EWAS) on lipids: the Rotterdam study. Clin. Epigenet. 9, 15 (2017).
Simpson, J. T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods 14, 407–410 (2017).
Zhou, W., Laird, P. W. & Shen, H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 45, e22 (2017).
Winkler, T. W. et al. Quality control and conduct of genome-wide association meta-analyses. Nat. Protoc. 9, 1192–1212 (2014).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Conomos, M. P., Reiner, A. P., Weir, B. S. & Thornton, T. A. Model-free estimation of recent genetic relatedness. Am. J. Hum. Genet. 98, 127–148 (2016).
Min, J. L., Hemani, G., Davey Smith, G., Relton, C. & Suderman, M. Meffil: efficient normalization and analysis of very large DNA methylation datasets. Bioinformatics 34, 3983–3989 (2018).
Zeilinger, S. et al. Tobacco smoking leads to extensive genome-wide changes in DNA methylation. PLoS ONE 8, e63812 (2013).
Aulchenko, Y. S., de Koning, D. J. & Haley, C. Genomewide rapid association using mixed model and regression: a fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics 177, 577–585 (2007).
Chen, Y. A. et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8, 203–209 (2013).
Naeem, H. et al. Reducing the risk of false discovery enabling identification of biologically significant genome-wide methylation status using the HumanMethylation450 array. BMC Genom. 15, 51 (2014).
Price, M. E. et al. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenet. Chromatin 6, 4 (2013).
Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).
Dahl, A., Guillemot, V., Mefford, J., Aschard, H. & Zaitlen, N. Adjusting for principal components of molecular phenotypes induces replicating false positives. Genetics 211, 1179–1189 (2019).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
DerSimonian, R. & Laird, N. Meta-analysis in clinical trials. Control Clin. Trials 7, 177–188 (1986).
Hedges, L. V. & Olkin, I. Statistical Methods for Meta-Analysis 189–203 (Academic Press, 1985).
Acknowledgements
C.L.R., G.D.S., G.S., J.L.M., K.B., M. Suderman, T.G.R. and T.R.G. are supported by the UK Medical Research Council (MRC) Integrative Epidemiology Unit at the University of Bristol (MC_UU_00011/1, MC_UU_00011/4, MC_UU_00011/5). C.L.R. receives support from a Cancer Research UK Programme grant (no. C18281/A191169). G.H. is funded by the Wellcome Trust and the Royal Society (208806/Z/17/Z). E.H. and J.M. were supported by MRC project grants (nos. MR/K013807/1 and MR/R005176/1 to J.M.) and an MRC Clinical Infrastructure award (no. MR/M008924/1 to J.M.). B.T.H. is supported by the Netherlands CardioVascular Research Initiative (the Dutch Heart Foundation, Dutch Federation of University Medical Centres, the Netherlands Organisation for Health Research and Development, and the Royal Netherlands Academy of Sciences) for the GENIUS project ‘Generating the best evidence-based pharmaceutical targets for atherosclerosis’ (CVON2011-19, CVON2017-20). J.T.B. was supported by the Economic and Social Research Council (grant no. ES/N000404/1). The present study was also supported by JPI HDHL-funded DIMENSION project (administered by the BBSRC UK, grant no. BB/S020845/1 to J.T.B., and by ZonMW the Netherlands, grant no. 529051021 to B.T.H). A.D.B. has been supported by a Wellcome Trust PhD Training Fellowship for Clinicians and the Edinburgh Clinical Academic Track programme (204979/Z/16/Z). J. Klughammer was supported by a DOC fellowship of the Austrian Academy of Sciences. Cohort-specific acknowledgements and funding are presented in the Supplementary Note.
Author information
Authors and Affiliations
Consortia
Contributions
G.H., G.S. and J.L.M. managed the project. A.A.C., A. Caspi, A.D.H., A.G.U, A. Metspalu, A. Murray, A.M.M., B.B., B.T.H., C.H., C.L.R., C.P., C. Sacerdote, C. Shaw, C. Söderhäll, D.A.L., D.v.H., D.I.B., D.-A.T., E.A.N., E.B.B., E.J.C.d.G, E.M., F.G., F.R., G.E.D, G.H.K., G.P., G.W.M., H.R.E., H.T., H.Z., I.J.D., J.F.F., J.H.V., J.J.-C., J. Kaprio, J.L., J.M., J.M.S., J.M.V., J.v.M., J.R., J.R.B.P., J.R.G., J. Shin, J.T.B., J.W., J.W.H., K.K.O., K.L.E., K.R., L.A., L.C.S., L.M., M.A.I., M. Beekman, M. Bustamante, M.E.A.-R., M.H.v.IJ., M. Kerick, M.O., N.C., N.G.M., N.J.W., N.R.W., P.E.S., P.-E.M., P.M.V., R.-C.H., R.P., S.L., S.P., T.D.S., T.E., T.E.M., T.I.A.S, T.P., T.T., V.W.V.J., W.K. and Z.P. designed individual studies and contributed data. A.A.K., A.I., A.S., B.C., C.S.M., H.R.E., J.L.M., K.B., K.M.H., N.K., S.M.R., T.H., R.M.W. and W.L.M. generated and/or quality-controlled data. G.H., J.L.M., M. Suderman, T.R.G. and V.I. designed new statistical or bioinformatics tools. A.D.B., A. Cardona, A.D., A.F.M., A.K., B.T.H., C.B., C.H., C.L.R., C.R.-A., C.S.-T., C.V., C.-J.X., C.W., D.A., D.C., D.J.L., D.L.C., D.M., E.C.-M., E.G.-S., E.H., E.M., F.C.-M., F.I.R., F.R.D., G.B., G.C., G.D.S., G.H., G.H.K., G.M., G.W., I.Y., J.C.-F., J.v.D., J.-J.H., J. Kaprio, J. Klughammer, J.L.M., J.M., J. Sunyer, J.T.B., K.B., K.v.E., K.F.D., K.S., L.C.S., M. Bernard, M. Bustamante, M.H.v.IJ., M.G., M. Kumari, M.L., M. Smart, M. Suderman, N.K., P. Melton, P. Mandaviya, P.M.V., R.E.M., R.G., R.L., R.Z., S.B., S.G., S.K., T.-K.C., T.G.-S., T.G.R., T.I.A.S., T.L., T.R.G., Y.A., Y.Z., V.I. and V.S. analyzed the data and/or provided critical interpretation of results. B.T.H., C.B., C.L.R., J.M., J.T.B. and T.R.G. designed and/or managed the study. A.D.B., B.T.H., C.B., C.L.R., D.J.L., E.C.-M., E.H., G.D.S., G.H., J.C.-F., J. Klughammer, J.L.M., J.M., J.T.B., K.B., K.F.D., M. Suderman, P.M.V., R.L., T.G.R., T.R.G. and V.I. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
T.R.G. receives funding from GlaxoSmithKline and Biogen for unrelated research. The other authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Quality control of 36 studies.
We used 337 independent SNPs on chromosome 20 with a p-value<1e-14. The number of SNPs used for each study are indicated in the bottom plot. a, Mstatistic (Magosi et al., PLoS Genet., 13, e1006755 (2017)) for each of the 36 cohorts. b, Boxplot of mQTL effect sizes for each of the 36 studies. The center line of a boxplot corresponds to the median value. The lower and upper box limits indicate the first and third quartiles (the 25th and 75th percentiles). The length of the whiskers corresponds to values up to 1.5 times the IQR in either direction.
Extended Data Fig. 2 Distance of SNP from DNAm site.
a, Density plot of the distance of SNP from DNAm site against the -log10 p-value of 4,533 intrachromosomal trans-mQTL associations (>1Mb). b, Density plot of the distance of SNP from DNAm site against the -log10 p-value of 248,607 cis-mQTL associations (<1Mb).
Extended Data Fig. 3 Effect sizes and weighted standard deviation (SD) for each mQTL category.
a, For each DNAm site, the strongest absolute effect size (the maximum absolute additive change in DNAm level measured in SD per allele) was selected. The kernel density estimations of the effect sizes were shown for all sites with a mQTL (n=190,102), sites with cis only effects (n=170,986), cis effects for sites with cis and trans effects (n=11,902), trans effects for sites with cis and trans effects (n=11,902) and sites with trans only effects (n=7,214). Comparing the strongest effect size for each site in a two-sided linear regression model showed that cis+trans sites had larger cis effect sizes (per allele SD change = 0.05 (s.e.= 0.002), p<2e-16) as compared to cis only sites and weaker trans effect sizes (per allele SD change = −0.06 (s.e.= 0.002), p<2e-16) as compared to trans only sites. To detect these small trans effect sizes at sites with both a cis and a trans association, it is crucial to regress out the cis effect to decrease the residual variance and improve power to detect a trans effect. b, The violin plots represent kernel density estimates of the weighted SD across 36 cohorts for each DNAm site. The center line of the boxplot in the violin plots corresponds to the median value. The lower and upper box limits indicate the first and third quartiles (the 25th and 75th percentiles). The length of the whiskers corresponds to values up to 1.5 times the IQR in either direction.
Extended Data Fig. 4 Impact of the twostage design on mQTL coverage.
a, Loss in power in twostage design. We calculated the power of detecting a cis association in at least one of the 22 studies at p<1e-5 or a trans association in at least two of 22 studies at p<1e-5. b, Expected number of mQTLs. Using the number of mQTLs with a particular r2 value, and the power of detecting mQTLs with that r2 value, we calculated how many mQTLs would expect to exist with that value.
Extended Data Fig. 5 Correlation of mQTL effects (p<1e-14) between blood and other tissues.
For each mQTL category, the correlation of genetic effects between tissues (rb) were estimated using the rb method25 where we used the blood mQTLs as reference. DNAm levels are categorized as low (<0.2), intermediate (0.2–0.8) or high (>0.8).
Extended Data Fig. 6 2D enrichment of SNP and DNAm site TFBS annotation.
a, To test if the annotations of the SNPs involved in trans-mQTLs were specific to the annotations of the DNAm sites that they influence, we compared the real SNP-DNAm site pairs against permuted SNP-DNAm site pairs, where the biological link between SNP and site is severed whilst maintaining the distribution of annotations for the SNPs and sites. We constructed 100 such permuted datasets b, SNP and site positions were annotated against genomic features, and we quantified how frequently mQTLs were found for each pair of SNP-DNAm site annotations. This enabled the construction of 2D-annotation matrices for both the real trans-mQTL list and the permuted trans-mQTL lists. c, Distribution of two-dimensional enrichment values of trans-mQTLs. There was substantial departure from the null in the real dataset for all tissues indicating that the TFBS of a site depended on the TFBS of the SNP that influenced it. d, A bipartite graph of the two-dimensional enrichment for trans-mQTLs, SNPs annotations (blue) with pemp< 0.01 after multiple testing correction co-occur with particular site annotations (red).
Extended Data Fig. 7 Correspondence of MR estimates amongst multiple independent instruments.
a, To evaluate if a site having a shared causal variant with a trait was potentially due to the site being on the causal pathway to the trait, we reasoned that independent instruments for the site should exhibit consistent effects on the outcome consistent with the original co-localizing variant. b, Amongst the putative co-localizing signals, 440 involved a DNAm site that had at least one other independent mQTL. The plot shows the causal effect estimate estimated from the original co-localizing signal against the causal effect estimates obtained from the independent variants (n=440). Grey regions represent the 95% confidence of the slope. c, Correspondence of MR estimates amongst multiple independent instruments on 36 blood traits. To evaluate if a site having a shared causal variant with a blood trait was potentially due to the site being on the causal pathway to the trait, we reasoned that independent instruments for the site should exhibit consistent effects on the outcome consistent with the original co-localizing variant. Amongst the putative co-localizing signals, 30% involved a DNAm site that had at least one other independent mQTL. The plot shows the causal effect estimate estimated from the original co-localizing signal against the causal effect estimates obtained from the independent variants. The HLA region has been removed and betas are plotted.
Extended Data Fig. 8 Genomic inflation factors for genome-wide scans of causal effects of traits on DNAm sites.
Each trait (x axis) was tested for causal effects against (on average) 317,659 DNAm sites, excluding sites in the MHC region. The p-values from IVW MR analysis were used to estimate the genomic inflation for each trait (y-axis). Traits are ordered by genomic inflation factor.
Supplementary information
Supplementary Note
Supplementary Methods and Results, Acknowledgements, Supplementary Figs. 1–40, Supplementary References.
Supplementary Tables
Supplementary Tables 1–20.
Supplementary Data 1
Discovery and replication of 169,656 mQTL associations in GoDMC (n = 27,750) and Generation Scotland (n = 5,101).
Supplementary Data 2
The relationship between the variance in DNA methylation explained by mQTL effects in GoDMC, and the estimated contribution of additive genetic effects.
Rights and permissions
About this article
Cite this article
Min, J.L., Hemani, G., Hannon, E. et al. Genomic and phenotypic insights from an atlas of genetic effects on DNA methylation. Nat Genet 53, 1311–1321 (2021). https://doi.org/10.1038/s41588-021-00923-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-021-00923-x
This article is cited by
-
Epigenome-wide association study of dietary fatty acid intake
Clinical Epigenetics (2024)
-
Epigenetics of prenatal stress in humans: the current research landscape
Clinical Epigenetics (2024)
-
Causality-enriched epigenetic age uncouples damage and adaptation
Nature Aging (2024)
-
Intrauterine growth and the tangential expansion of the human cerebral cortex in times of food scarcity and abundance
Nature Communications (2024)
-
Epigenetic clock work ticks forward
Nature Aging (2024)