Article Text

Associations of network-derived metabolite clusters with prevalent type 2 diabetes among adults of Puerto Rican descent
1. Danielle E Haslam1,2,
2. Liming Liang3,4,
3. Dong D Wang1,2,
4. Rachel S Kelly1,
5. Clemens Wittenbecher2,
6. Cynthia M Pérez5,
7. Marijulie Martínez6,
8. Chih-Hao Lee7,
9. Clary B Clish8,
10. David T W Wong9,
11. Laurence D Parnell10,
12. Chao-Qiang Lai10,
13. José M Ordovás11,12,
14. JoAnn E Manson1,4,13,
15. Frank B Hu1,2,4,
16. Meir J Stampfer1,2,4,
17. Katherine L Tucker14,
18. Kaumudi J Joshipura4,6,
19. Shilpa N Bhupathiraju1,2
1. 1Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
2. 2Nutrition, Harvard T H Chan School of Public Health, Boston, Massachusetts, USA
3. 3Biostatistics, Harvard T H Chan School of Public Health, Boston, Massachusetts, USA
4. 4Epidemiology, Harvard T H Chan School of Public Health, Boston, Massachusetts, USA
5. 5Department of Biostatistics and Epidemiology, Graduate School of Public Health, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico
6. 6Center for Clinical Research and Health Promotion, University of Puerto Rico Medical Sciences Campus, San Juan, Puerto Rico
7. 7Molecular Metabolism, Harvard T H Chan School of Public Health, Boston, Massachusetts, USA
8. 8Metabolomics Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
9. 9Center for Oral/Head and Neck Oncology Research, School of Dentistry, University of California Los Angeles, Los Angeles, California, USA
10. 10Agricultural Research Service, Jean Mayer US Department of Agriculture Human Nutrition Research Center on Aging at Tufts University, Boston, Massachusetts, USA
11. 11IMDEA-Food Institute, CEI UAM+CSIC, Madrid, Spain
12. 12Nutrition and Genomics, Jean Mayer US Department of Agriculture Human Nutrition Research Center on Aging at Tufts University, Boston, Massachusetts, USA
13. 13Division of Preventive Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
14. 14Department of Biomedical and Nutritional Sciences and Center for Population Health, University of Massachusetts Lowell, Lowell, Massachusetts, USA
1. Correspondence to Dr Danielle E Haslam; nhdah{at}channing.harvard.edu

## Abstract

Introduction We investigated whether network analysis revealed clusters of coregulated metabolites associated with prevalent type 2 diabetes (T2D) among Puerto Rican adults.

Research design and methods We used liquid chromatography-mass spectrometry to measure fasting plasma metabolites (>600) among participants aged 40–75 years in the Boston Puerto Rican Health Study (BPRHS; discovery) and San Juan Overweight Adult Longitudinal Study (SOALS; replication), with (n=357; n=77) and without (n=322; n=934) T2D, respectively. Among BPRHS participants, we used unsupervised partial correlation network-based methods to identify and calculate metabolite cluster scores. Logistic regression was used to assess cross-sectional associations between metabolite clusters and prevalent T2D at the baseline blood draw in the BPRHS, and significant associations were replicated in SOALS. Inverse-variance weighted random-effect meta-analysis was used to combine cohort-specific estimates.

Results Six metabolite clusters were significantly associated with prevalent T2D in the BPRHS and replicated in SOALS (false discovery rate (FDR) <0.05). In a meta-analysis of the two cohorts, the OR and 95% CI (per 1 SD increase in cluster score) for prevalent T2D were as follows for clusters characterized primarily by glucose transport (0.21 (0.16 to 0.30); FDR <0.0001), sphingolipids (0.40 (0.29 to 0.53); FDR <0.0001), acyl cholines (0.35 (0.22 to 0.56); FDR <0.0001), sugar metabolism (2.28 (1.68 to 3.09); FDR <0.0001), branched-chain and aromatic amino acids (2.22 (1.60 to 3.08); FDR <0.0001), and fatty acid biosynthesis (1.54 (1.29 to 1.85); FDR <0.0001). Three additional clusters characterized by amino acid metabolism, cell membrane components, and aromatic amino acid metabolism displayed significant associations with prevalent T2D in the BPRHS, but these associations were not replicated in SOALS.

Conclusions Among Puerto Rican adults, we identified several known and novel metabolite clusters that associated with prevalent T2D.

• diabetes mellitus
• type 2
• epidemiology
• metabolism

## Data availability statement

Data are available upon reasonable request. Information on requesting data from the BPRHS (https://www.uml.edu/Research/UML-CPH/Research/bprhs/) and SOALS (http://soals.rcm.upr.edu/) can be found on their respective websites.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

### Significance of this study

• Few studies have used network analysis approaches to identify plasma metabolites associated with type 2 diabetes (T2D), and none have been conducted among individuals of Puerto Rican descent, who are at high risk for T2D and related comorbidities.

#### What are the new findings?

• We have identified six metabolite clusters within a global metabolite network that are associated with prevalent T2D among adults of Puerto Rican descent.

• A novel cluster of acyl cholines were associated with lower odds of T2D.

• We replicated several previously observed associations and revealed novel coregulated metabolites in clusters, including branched-chain and aromatic amino acids, sphingolipids, and cell membrane components, along with metabolites related to glucose transport, fatty acid biosynthesis, and sugar metabolism.

#### How might these results change the focus of research or clinical practice?

• We identified several metabolite pathways that are perturbed among individuals of Puerto Rican descent with T2D.

• These metabolites should be explored further in prospective studies and studies examining lifestyle, genetic, or environmental factors that may influence these metabolites.

## Introduction

The prevalence of type 2 diabetes (T2D) differs substantially by race and ethnicity. Individuals of Puerto Rican (PR) descent experience particularly high rates of T2D and death from related comorbidities.1–3 Plasma metabolomics may help us gain an in depth understanding of the underlying molecular processes contributing to T2D among this high-risk population.4 However, much of the literature examining metabolomics and T2D focuses on non-Hispanic white populations.5–11

Previous studies implicate several potential metabolites that positively associate with T2D, including 2-aminoadipic acid,5 acylcarnitines,9 12 glutamate,6 and metabolites related to aromatic and branched-chain amino acid (BCAA) metabolism,8 10 11 and glutamine negatively associates with T2D.6 However, none of these associations has been evaluated among PRs, who have a unique genetic background and unique lifestyle habits compared with non-Hispanic white populations.13 14 Further, most of these studies used candidate or single-metabolite approaches, with only a few leveraging the correlations between metabolites to identify pathways or networks that may be perturbed among those with T2D.15–17 Distinct subtypes of metabolic alterations often precede or accompany T2D and are defined by distinct patterns of diabetes-related risk factors, such as obesity, insulin resistance, or dyslipidemia.18–20 Examination of the associations between metabolites and diabetes-related risk factors may help identify novel mechanisms underlying these subtypes. Thus, integration of metabolic profiles of T2D with related risk factors among individuals of PR descent could reveal novel metabolic changes occurring after T2D diagnosis and biomarker candidates for early detection of T2D in this high-risk population.

In this study, we used a data-driven, network-based method to reduce 614 named metabolites to distinct clusters that may reflect underlying biological connections between metabolites among PR adults. Then, we used a cross-sectional design to examine whether these clusters were associated with prevalent T2D and diabetes-related risk factors among 679 participants of the Boston Puerto Rican Health Study (BPRHS) and 1011 participants of the San Juan Overweight Adult Longitudinal Study (SOALS). The goal was to identify potential metabolites or metabolic pathways that may be good candidate biomarkers for T2D in PR adults that could be explored in future studies.

## Research design and methods

### Study participants

The BPRHS (discovery) is a well-characterized, population-based longitudinal cohort study of 1500 PR adults aged 45–75 years living in or near Boston, Massachusetts. A previous publication provides details about the design and data collection for the BPRHS.21 Participants were recruited primarily through door-to-door enumeration from areas of high Hispanic density, but also through advertisement in flyers, participation in community events, and referrals from participants. Bilingual interviewers visited participants’ homes to collect data at baseline (2004) and follow-up visits.

The SOALS (replication) is a longitudinal, population-based cohort study of 1300 individuals aged 40–65 years who were residents of the San Juan municipality and its vicinity and were overweight or obese (body mass index (BMI) ≥25 kg/m2). Additional details about the design and data collection are provided in previous publications.22–24 Participants were recruited through different types of mass media, and data collection began in 2011 (baseline).

In this study, we included all BPRHS participants with available metabolomic profiling (n=679). Of these, 322 participants free of T2D (controls) were included in the derivation of the metabolic networks and associations with diabetes-related risk factors. The 357 participants with T2D at the time of blood draw (ie, prevalent cases) were used to examine the associations between metabolic network clusters and prevalent T2D. In an important step to follow best practices in omics-based studies,25 we replicated our findings in a total of 1011 SOALS participants with metabolomics profiling available. Of these, 934 participants were free of T2D and 77 met the criteria for T2D.

### Blood sample collection and metabolomics profiling

All participants were asked to fast for 12 hours in the BPRHS and 10 hours in the SOALS before the blood draw. In the BPRHS, blood was drawn at participants’ homes, and a portable centrifuge was used to separate the plasma. The samples were transported to the laboratory in coolers with dry ice, processed, and stored at −70°C. In SOALS, samples were drawn at the baseline visit, centrifuged, and stored at −80°C. Blinded specimens were assayed by Metabolon (Durham, North Carolina) in 2017 for the BPRHS and in 2019 for SOALS, as previously described.26 Briefly, chemical peaks were identified using positive and negative ionization ultra-high-performance liquid chromatography-tandem mass spectrometry and gas chromatography-mass spectrometry. The software developed at Metabolon was used to identify 943 known compounds in the BPRHS and 1062 in SOALS, with differences due to the number of known peaks at the time of the analysis.

Internal quality control samples were included, and injection order was random with respect to case status. Several processing steps were implemented before analysis. Metabolites with a detection rate <75% and xenobiotics were removed. Among the remaining metabolites, undetectable values were imputed at a value equal to half the minimum of each measured metabolite. To reduce the impact of outliers and skewed distributions, we applied an inverse normal transformation. After processing, 614 named metabolites were available for analysis in both the BPRHS and SOALS cohorts.

### T2D cases and related risk factors

Prevalent T2D at baseline was defined using the American Diabetes Association criteria: fasting plasma glucose concentration ≥126 mg/dL (7.0 mmol/L), glycosylated hemoglobin (HbA1c) ≥6.5% (48 mmol/mol), or use of antidiabetic medication.27 In SOALS, additional T2D cases were identified by 2-hour fasting glucose concentration ≥200 mg/dL (11.0 mmol/L) during an oral glucose tolerance test (OGTT), a measure that was not available among BPRHS participants. In the BPRHS, glucose was measured using an enzymatic, kinetic reaction (OSCR6121; Olympus America, Melville, New York), where the intra-assay and interassay coefficients of variation (CV%) for serum glucose were 2% and 3.4%, respectively. In SOALS, glucose was assessed using a SIRRUS analyzer (intra-assay CV%: 1.21%; interassay: 3.06%) and HbA1c was assessed using a latex immunoagglutination inhibition method (intra-assay CV%: 2.89%; interassay: 1.88%).

Additional diabetes-related risk factors assayed among BPRHS and SOALS participants at baseline include insulin, Homeostatic Model Assessment for Insulin Resistance (HOMA-IR), HbA1c, high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG). All analytes were measured using standardized protocols with intra-assay and interassay CV% ranging from 1.5% to <10%. Height and weight were measured in duplicate at baseline visits.21 22 BMI was calculated as weight (kg) divided by height (m2). Waist circumference (WC) was measured at the umbilical level and recorded to the nearest 0.1 cm.

### Measurement of covariates

Validated questionnaires were used to collect information on age, education, household income, medication use, family history, and health behaviors, including medication use and history of smoking and alcohol consumption. A validated Food Frequency Questionnaire (FFQ) designed for the PR population was used to assess habitual food consumption and nutrient intakes in the BPRHS.28 Reported food intakes were converted to food groups to assess adherence to the American Heart Association Diet Score (AHA-DS).29 An FFQ was administered to only a subset of SOALS participants, so we did not include diet as a covariate in SOALS analyses. Physical activity was assessed using a modified version of the Paffenbarger questionnaire30 31 in BPRHS participants. A metabolic equivalent (MET) score was calculated based on a questionnaire that assessed the time and frequency of physical activities during a typical week among SOALS participants. We assessed stress using the Spanish version of the Perceived Stress Scale.32 In the BPRHS, acculturation was captured through the bidimensional Acculturation Scale for Hispanics, which yields a score of up to 100, indicating full acculturation with fluent use of the English language33 (not applicable in SOALS).

### Statistical methods

Among participants without T2D, normalized levels of all available metabolites were used to construct a global metabolite network (ie, network capturing an extensive range of biological processes) based on Spearman’s rank partial correlations separately in the BPRHS and SOALS using the igraph R package (https://cran.r-project.org).34 Each correlation coefficient for a pair of metabolites was calculated conditioning on the remaining metabolites. All edges and paths between metabolites with a Benjamini-Hochberg false discovery rate (FDR) <0.05 in both the BPRHS and SOALS networks were retained to create a final global metabolite network. The structure and connectedness of this large metabolic network was examined using an algorithm developed to identify distinct clusters (greedy optimization algorithm35) and calculate weights for metabolites within each cluster.36 The metabolite weights were then used to calculate a weighted sum of metabolite levels among participants for each cluster (metabolite cluster score).

We first evaluated the relationships between all metabolite cluster scores (continuous) and prevalent T2D through logistic regression models in the discovery cohort (BPRHS). OR and 95% CI correspond to the odds of prevalent T2D with each 1 SD increase in the sum of weighted metabolite concentrations in each cluster. Second, clusters significantly associated with T2D in the BPRHS (FDR <0.05) were tested in the replication cohort (SOALS). For replication, weights derived from the final global network were used to compute continuous cluster scores in SOALS. Third, ORs in the discovery and replication cohorts for clusters found to be significantly associated with prevalent T2D in the BPRHS were combined through inverse-variance weighted, fixed-effects meta-analyses using the meta R package (https://cran.r-project.org). The I2 statistic was used to assess heterogeneity among the cohorts. We built three multivariable models to assess the influence of covariates: model 1, adjusted for age and sex; model 2, adjusted for model 1 covariates plus smoking status (former, current, never), education (≤8th grade, 9th–12th grade or GED (General Educational Development Test), college/some graduate school), physical activity (continuous score (BPRHS) or METs (SOALS)), alcohol intake (non-drinker, moderate (women: 1 drink/day; men: 1–2 drinks/day), heavy (women: >1 drink/day; men: >2 drinks/day)), lipid-lowering medication (yes/no), hypertension medication (yes/no), income (<$20 000/year, ≥$20 000/year), acculturation (%; BPRHS only), Perceived Stress Score (continuous score), diet quality (AHA-DS continuous score; BPRHS only), mouthwash use (yes/no; SOALS only), and family history of T2D (yes/no; SOALS only); and model 3, adjusted for model 2 covariates plus WC and BMI. We subjectively assigned cluster names according to pathway membership of the majority of or heavily weighted metabolites in that cluster. Sensitivity analyses were conducted replacing the weighted metabolite cluster score with an unweighted score, along with a joint model including all clusters significantly associated with prevalent T2D in the BPRHS. To provide additional insight into which metabolites may be driving the overall cluster associations, we evaluated the associations between individual metabolites within clusters that significantly associated with prevalent T2D in both cohorts.

For the metabolite clusters significantly associated with prevalent T2D in the BPRHS (FDR <0.05), we examined cross-sectional associations between metabolite cluster scores and diabetes-related risk factors among participants in the BPRHS and SOALS participants without T2D. Linear regression models adjusting for model 3 covariates (excluding BMI and WC for those outcomes) were used to quantify the associations between 1 SD increase in the weighted sum of metabolite clusters and glucose, insulin, HOMA-IR, HbA1c, HDL-C, TG, BMI, and WC. To increase interpretability, adjusted means by quartiles of metabolite cluster scores were used to present the results for diabetes-related risk factors. Sensitivity analyses were conducted among all participants, adjusting for T2D status.

All p values were corrected for multiple testing using the Benjamini-Hochberg procedure with targeted FDR <0.05.37 All statistical analyses were completed using SAS (V.9.4) or R (V.3.6.0) statistical software, and individuals with missing covariate data were excluded from these statistical models.

## Results

In the BPRHS, participants with T2D tended to be older, have higher WC, BMI, and TG, have lower HDL-C, consume less alcohol, and were more likely to be prescribed lipid-lowering or hypertension medications (table 1). In SOALS, participants with T2D tended to have higher WC and TG and were less likely to be female or prescribed hypertension medications.

Table 1

Baseline characteristics of study participants in BPRHS and SOALS stratified by T2D status*

In the global metabolite network derived from the BPRHS and SOALS participants without T2D, 69 metabolite clusters containing ≥2 metabolites were identified. After FDR correction, a total of nine metabolite cluster scores were significantly associated with prevalent T2D among the BPRHS participants (discovery) in the fully adjusted model (table 2). Metabolites included in each of the nine clusters are provided in online supplemental table S1, network representations are provided in figure 1, and the results from multivariable models 1 and 2 are presented in online supplemental table S2. Four clusters that primarily contained metabolites related to glucose transport (cluster 33; figure 1A), sphingolipids (cluster 2; figure 1B), acyl cholines (cluster 22; figure 1F), and cell membrane components (cluster 26; figure 1H) were associated with lower odds of T2D, and five clusters characterized by sugar metabolism (cluster 53; figure 1C), BCAA and aromatic amino acids (cluster 21; figure 1D), fatty acid biosynthesis (cluster 40; figure 1E), amino acid metabolism (cluster 14; figure 1G), and aromatic amino acid metabolism (cluster 25; figure 1I) were associated with higher odds of T2D. Associations between all 69 of the identified metabolite clusters and prevalent T2D among BPRHS participants are presented in online supplemental table S3. Associations between individual metabolites and prevalent T2D in the statistically significant clusters are presented in online supplemental figure S1.

### Supplemental material

Figure 1

Metabolite clusters significantly associated with prevalent type 2 diabetes in Boston Puerto Rican Health Study participants. Nodes represent metabolites and edges represent partial correlation with p<0.05 for metabolite pairs. Direction of association between metabolites and prevalent type 2 diabetes is indicated by node shape, where circles represent inverse and squares represent positive associations. Nodes are colored according to metabolite class. The following names have been given to each cluster based on the metabolites included: (A) cluster 33: glucose transport; (B) cluster 2: sphingolipids; (C) cluster 53: sugar metabolism; (D) cluster 21: BCAA and aromatic amino acids; (E) cluster 40: fatty acid biosynthesis; (F) cluster 22: acyl cholines; (G) cluster 14: amino acid metabolism; (H) cluster 26: cell membrane components; and (I) cluster 25: aromatic amino acid metabolism. *Indicates a statistically significant association (FDR<0.05) (blue=BPRHS; orange=SOALS). BCAA, branched-chain amino acids; GPC, glycerophosphocholine; SPH, sphingomyelin.

Table 2

OR (95% CI) for prevalent T2D per 1 SD difference in weighted sum of metabolite concentrations in top metabolite clusters among BPRHS (n=679) and SOALS (n=1011) participants*

Six of the nine T2D-associated clusters from the BPRHS were replicated in fully adjusted models among SOALS participants (FDR <0.05: glucose transport, sphingolipids, sugar metabolism, BCAA and aromatic amino acids, fatty acid biosynthesis, and acyl cholines) (table 2). In a meta-analysis of BPRHS and SOALS results, three of the replicated clusters associated with lower odds of T2D: glucose transport (OR (95% CI) 0.21 (0.16 to 0.30); FDR <0.0001), sphingolipids (OR (95% CI) 0.40 (0.29 to 0.53); FDR <0.0001), and acyl cholines (OR (95% CI) 0.35 (0.22 to 0.56); FDR <0.0001), and three associated with higher odds of T2D: sugar metabolism (OR (95% CI) 2.28 (1.68 to 3.09); FDR <0.0001), BCAA and aromatic amino acids (OR (95% CI) 2.22 (1.60 to 3.08); FDR <0.0001), and fatty acid biosynthesis (OR (95% CI) 1.54 (1.29 to 1.85); FDR <0.0001). The results were similar in sensitivity analyses using an unweighted metabolite cluster scoring scheme, although attenuated effect sizes were observed (online supplemental table S4). In a sensitivity analysis including all clusters significantly associated with T2D in BPRHS participants in one model, the glucose transport, sugar metabolism, BCAA and aromatic amino acids, and fatty acid biosynthesis clusters, remained significant (online supplemental table S5).

The six T2D-associated clusters that were replicated in SOALS were all significantly associated with various diabetes-related risk factors among participants without T2D in the BPRHS and/or SOALS cohorts (figure 2, online supplemental tables S6 and S7). The larger sample size resulted in more statistically significant findings in the SOALS cohort, but trends were similar in the BPRHS. SOALS participants in the highest versus lowest quartile of the glucose transport cluster had significantly higher TG (SOALS FDR <0.0001) and HDL-C (SOALS FDR=0.02). SOALS participants in the highest versus lowest quartile of the sphingolipids cluster had significantly higher HDL-C concentrations (SOALS FDR <0.0001). Participants in the highest versus lowest quartile for the acyl cholines cluster had significantly higher TG (SOALS FDR <0.0001) and HDL-C (BPRHS FDR=0.02; SOALS FDR <0.0001) concentrations, along with lower WC (SOALS FDR=0.05), glucose (SOALS FDR <0.0001), insulin (SOALS FDR <0.0001), and HOMA-IR (SOALS FDR <0.0001). SOALS participants in the highest versus lowest quartile of the sugar metabolism cluster had higher HbA1c (SOALS FDR=0.05) and TG (SOALS FDR <0.0001). Participants in the highest versus lowest quartile of the BCAA and aromatic amino acid cluster had higher TG concentrations in both cohorts (BPRHS FDR=0.02; SOALS FDR <0.0001), and higher glucose (SOALS FDR=0.03), insulin (SOALS FDR <0.0001), HOMA-IR (SOALS FDR <0.0001), and HbA1c (SOALS FDR=0.001) in the SOALS cohort. Participants in the highest versus lowest quartile of the fatty acid biosynthesis cluster had higher HDL-C concentrations (BPRHS FDR=0.002; SOALS FDR=0.01) in both cohorts. When examining the associations between metabolite clusters and diabetes-related risk factors including participants with T2D and adjusting for diabetes status, the associations were generally similar, with many trends reaching statistical significance in the BPRHS due to an increase in sample size (online supplemental tables S8 and S9).

Figure 2

Adjusted means by quartile of weighted T2D-associated metabolite cluster scores among BPRHS (n=322) and SOALS (n=934) participants without T2D. Cross-sectional associations adjusted for age, sex, diet quality (BPRHS only), smoking status, education, physical activity, alcohol intake, lipid-lowering medication use, hypertension medication use, income, acculturation (BPRHS only), Perceived Stress Score, mouthwash use (SOALS only), family history of T2D (SOALS only), WC (except WC and BMI outcomes), and BMI (except WC and BMI outcomes). P values are corrected for multiple exposures and outcomes using the Benjamini-Hochberg FDR. BCAA, branched-chain amino acid; BMI, body mass index; BPRHS, Boston Puerto Rican Health Study; FDR, false discovery rate; HbA1c, glycosylated hemoglobin; HDL-C, high-density lipoprotein cholesterol; HOMA-IR, Homeostatic Model Assessment of Insulin Resistance; SOALS, San Juan Overweight Adult Longitudinal Study; T2D, type 2 diabetes; WC, waist circumference.

Although the three other T2D-associated clusters in BPRHS (amino acid metabolism (cluster 14), cell membrane components (cluster 26), and aromatic amino acid metabolism (cluster 25)) were not significantly associated with prevalent T2D in SOALS, a congruent direction of association was observed for two of the clusters (table 2). Additionally, participants in the highest versus lowest quartile of the amino acid metabolism cluster had significantly higher insulin concentrations in both cohorts (BPRHS FDR=0.003; SOALS FDR=0.0001) and HOMA-IR in the SOALS cohort (SOALS FDR=0.0003). Participants in the highest versus lowest quartile of the cell membrane components cluster in both cohorts had significantly higher TG concentrations (BPRHS FDR=0.002; SOALS FDR <0.0001). No associations between the aromatic amino acid metabolism cluster score and diabetes-related risk factors were observed in either cohort (all FDR >0.05).

## Discussion

We identified six metabolite clusters within a global metabolite network that were consistently associated with prevalent T2D in two independent cohorts of individuals of PR descent. To our knowledge, this is the first study to comprehensively compare over 600 plasma metabolites levels among individuals of PR descent with and without T2D. Participants were less likely to have T2D if they had higher concentrations of metabolites in three clusters characterized by sphingolipids, acyl cholines, or metabolites related to glucose transport. In contrast, participants with higher concentrations of metabolites within the three clusters characterized by sugar metabolism, BCAA and aromatic amino acid metabolism, and fatty acid biosynthesis had a higher likelihood of T2D. Additional metabolite clusters, including metabolites related to amino acid metabolism and components of cell membranes, also displayed suggestive associations with prevalent T2D. Several associations between metabolite clusters and known diabetes-related risk factors were also observed. These findings replicate known metabolites and identify novel metabolites associated with prevalent T2D, which may reflect biological changes occurring among individuals of PR descent with T2D.

This is the first population-based study to identify a cluster of acyl choline metabolites that were associated with lower odds of prevalent T2D. A previous small study (n=107) observed that arachidonylcholine (one component of the acyl choline cluster) levels were significantly lower among obese patients with T2D compared with obese insulin-sensitive patients.16 Consistently, we identified additional acyl cholines associated with lower odds of T2D, along with novel associations with higher HDL-C and lower WC, glucose, insulin, and HOMA-IR. The acyl choline cluster was also surprisingly associated with higher TG concentrations, suggesting that participants with high concentrations of metabolites in the acyl choline cluster represent a unique group of individuals that have higher TG concentrations but are less likely to have T2D. The glucose transport cluster that was associated with lower odds of T2D contains a well-established biomarker of short-term glycemic control, 1,5-anhydroglucitol (1,5-AG), which has been negatively correlated with plasma glucose concentrations.38 However, we identified additional metabolites in the same network cluster as 1,5-AG that also associated with lower odds of T2D. Several of these metabolites have been potentially linked to glucose transport in animal and cell studies, such as N-acetyltaurine39 and phosphatidylcholine,40 but the biological connections between the metabolites in this cluster are unclear. The sphingomyelin cluster was associated with a lower prevalence of T2D, similar to previous studies.41–43 This includes one large study in a mainly non-Hispanic white population (n=3082)42 and a moderate-sized (n=1035) study in an ethnically diverse population.43 We have expanded these findings to PR adults and identified the associations between sphingolipids and higher HDL-C concentrations. Previous studies have linked higher HDL-C concentrations to better pancreatic β cell function,44 45 suggesting that further exploration of how metabolites in this sphingolipid cluster influence pancreatic β cell function may be fruitful.

BCAA (isoleucine, leucine, and valine) and aromatic amino acid (tyrosine and phenylalanine) metabolites have been consistently associated with higher T2D risk and diabetes-related risk factors in prospective cohort studies among mainly non-Hispanic white populations, as well as cross-sectional and case–control studies in non-Hispanic white, Asian, and African American populations.8 Only one small prospective study included 50 participants of Hispanic ethnicity,46 and with their limited statistical power, they did not observe a higher risk of T2D among Hispanic participants with elevated BCAA and aromatic amino acid metabolites. In the current study, we identified a cluster of BCAA and aromatic amino acid metabolites that associated with higher odds of prevalent T2D, higher concentrations of glucose, insulin, and TG, as well as higher HOMA-IR and HbA1c among PR adults. We also replicated several associations of metabolites related to sugar metabolism (driven largely by mannitol and maltose) and fatty acid biosynthesis (including 2-hydroxybutyrate, 3-hydroxybutyrate, and several fatty acids) with high T2D risk.8 42 47 48 The clusters we identified include additional related metabolites that may point to mechanisms for how these metabolites could help characterize T2D phenotypes.

Additional clusters of metabolites that were significantly associated with prevalent T2D among BPRHS participants were not replicated among SOALS participants (amino acid metabolism, cell membrane components, and aromatic amino acid metabolites). However, some notable associations between these clusters and diabetes-related risk factors were observed. For example, a cluster of cell membrane components, including phosphatidylcholines, lysophospholipids, and sphingolipids, displayed a significant association with higher TG concentrations in both cohorts. Although the cell membrane components cluster was associated with lower odds of T2D among BPRHS participants, it was not associated with T2D prevalence among SOALS participants. This inconsistent association is likely due to the difference in the direction of associations for the individual metabolites within the cell membrane components cluster in the two cohorts. Sphingomyelin (d18:2/24:2) was consistently associated with a lower prevalence of T2D in both cohorts, but the associations for other metabolites were inconsistent, which could be due to chance or differences in study methodology or population characteristics. Cell membrane structure likely influences T2D risk,8 49 50 but further research on how complex changes in cell membrane structure and fluidity influence diabetes-related metabolic changes is warranted. The amino acid metabolism cluster also contained several individual metabolites that were strongly associated with the prevalence of T2D in consistent directions. This included glycine and leucine, which have been consistently linked to lower and higher T2D risk in previous studies, respectively.8

Our study has several strengths and limitations that influence the interpretation of the results. Network analysis allows us to make inferences about a large number of metabolites from data sets of moderate sample size. The cross-sectional design of this study provides no information on the directionality of associations. We, therefore, derived our metabolite network and examined the associations between metabolite clusters and diabetes-related risk factors among individuals without T2D. We also strengthened our findings through replication in an independent population. Although lack of consistency in some observed associations could be due to differences in methodology between the BPRHS and SOALS cohorts, our ability to identify consistent differences in metabolites by T2D status despite these differences lends strength to our findings. During the BPRHS baseline visit, OGTTs were not conducted, and a portion of participants may have been misclassified as not having T2D when their OGTT may have indicated the presence of T2D. This could lead to attenuation of the association between metabolite clusters and T2D in the BPRHS. Residual confounding is always possible in observational studies, and we have minimized this by controlling for many potential confounders and covariates. Given these weaknesses, findings should be interpreted as largely hypothesis-generating and warrant further examination using prospective study designs.

In summary, we used a data-driven, network-based method to identify several metabolic pathways that are perturbed among individuals of PR descent with T2D, providing deeper insight into pathways that should be further explored to understand the pathogenesis of T2D in this high-risk group. A novel cluster of acyl cholines was associated with a lower odds of T2D and should be investigated in other populations. We replicated several previously observed associations between metabolites and T2D and used network modules to reveal novel correlated metabolites that were also associated with the prevalence of T2D. Additional associations between the identified metabolite clusters and traditional diabetes-related risk factors provide insight into distinct T2D phenotypes that may be characterized by integrative alterations in these metabolites and traditional risk factors. More prospective studies are needed to assess which lifestyle, genetic, or environmental factors may influence these metabolites and whether alterations in these pathways start occurring before or whether they are a consequence of T2D.

## Data availability statement

Data are available upon reasonable request. Information on requesting data from the BPRHS (https://www.uml.edu/Research/UML-CPH/Research/bprhs/) and SOALS (http://soals.rcm.upr.edu/) can be found on their respective websites.

## Acknowledgments

The authors thank all BPRHS and SOALS participants and staff for their contribution to these studies.

• ## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

• Presented at Part of this study was presented as an abstract at the American Heart Association Scientific Sessions, November 13–17, 2020, a virtual experience.

• Contributors DEH, LL, and DDW performed the statistical analyses. DEH and SNB drafted the manuscript. DEH, LL, DDW, RSK, CW, CMP, MM, C-HL, CBC, DTWW, LDP, C-QL, JMO, JEM, FBH, MJS, KLT, KJJ, and SNB contributed to interpretation of the data and revised the article critically for important intellectual content. All authors approved the final version of the manuscript. DEH and SNB are guarantors of the work and, as such, had full access to all the data in the study and take responsibility for the integrity of the data and accuracy of the data analysis.

• Funding This work is supported by the National Institutes of Health (NIH): 2T32CA009001 (DEH), 1K01DK107804-01A1 (SNB), and 1R01DK120560-01 (DEH, LL, C-HL, DTWW, FBH, MJS, KLT, KJJ, and SNB).

• Competing interests DTWW is consultant to Mars Wrigley and Colgate-Palmolive and has equity in Liquid Diagnostics.

• Provenance and peer review Not commissioned; externally peer reviewed.

• Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.