Article Text

Mediation of an association between neighborhood socioeconomic environment and type 2 diabetes through the leisure-time physical activity environment in an analysis of three independent samples
  1. Katherine A Moon1,
  2. Cara M Nordberg2,
  3. Stephanie L Orstad3,4,
  4. Aowen Zhu5,
  5. Jalal Uddin5,
  6. Priscilla Lopez3,
  7. Mark D Schwartz3,6,
  8. Victoria Ryan7,
  9. Annemarie G Hirsch2,
  10. Brian S Schwartz1,2,
  11. April P Carson8,
  12. D Leann Long9,
  13. Melissa Meeker7,
  14. Janene Brown7,
  15. Gina S Lovasi7,10,
  16. Samranchana Adhikari4,
  17. Rania Kanchi4,
  18. Sanja Avramovic11,
  19. Giuseppina Imperatore12,
  20. Melissa N Poulsen2
  1. 1Department of Environmental Health and Engineering, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
  2. 2Department of Population Health Sciences, Geisinger, Danville, Pennsylvania, USA
  3. 3Department of Population Health, New York University Grossman School of Medicine, New York, New York, USA
  4. 4Department of Medicine, Division of General Internal Medicine and Clinical Innovation, New York University Grossman School of Medicine, New York, NY, USA
  5. 5Department of Epidemiology, The University of Alabama at Birmingham School of Public Health, Birmingham, Alabama, USA
  6. 6The Department of Veterans Affairs, New York Harbor Healthcare System, New York, NY, USA
  7. 7Department of Epidemiology and Biostatistics, Drexel University Dornsife School of Public Health, Philadelphia, Pennsylvania, USA
  8. 8Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
  9. 9Department of Biostatistics, University of Alabama at Birmingham School of Public Health, Birmingham, Alabama, USA
  10. 10The Urban Health Collaborative, Drexel University Dornsife School of Public Health, Philadelphia, PA, USA
  11. 11Department of Health Administration and Policy, George Mason University, Fairfax, Virginia, USA
  12. 12Surveillance, Epidemiology, Economics, and Statistics Branch, Division of Diabetes Translation, Centers for Disease Control and Prevention (CDC), Atlanta, Georgia, USA
  1. Correspondence to Dr Katherine A Moon; kmoon9{at}


Introduction Inequitable access to leisure-time physical activity (LTPA) resources may explain geographic disparities in type 2 diabetes (T2D). We evaluated whether the neighborhood socioeconomic environment (NSEE) affects T2D through the LTPA environment.

Research design and methods We conducted analyses in three study samples: the national Veterans Administration Diabetes Risk (VADR) cohort comprising electronic health records (EHR) of 4.1 million T2D-free veterans, the national prospective cohort REasons for Geographic and Racial Differences in Stroke (REGARDS) (11 208 T2D free), and a case–control study of Geisinger EHR in Pennsylvania (15 888 T2D cases). New-onset T2D was defined using diagnoses, laboratory and medication data. We harmonized neighborhood-level variables, including exposure, confounders, and effect modifiers. We measured NSEE with a summary index of six census tract indicators. The LTPA environment was measured by physical activity (PA) facility (gyms and other commercial facilities) density within street network buffers and population-weighted distance to parks. We estimated natural direct and indirect effects for each mediator stratified by community type.

Results The magnitudes of the indirect effects were generally small, and the direction of the indirect effects differed by community type and study sample. The most consistent findings were for mediation via PA facility density in rural communities, where we observed positive indirect effects (differences in T2D incidence rates (95% CI) comparing the highest versus lowest quartiles of NSEE, multiplied by 100) of 1.53 (0.25, 3.05) in REGARDS and 0.0066 (0.0038, 0.0099) in VADR. No mediation was evident in Geisinger.

Conclusions PA facility density and distance to parks did not substantially mediate the relation between NSEE and T2D. Our heterogeneous results suggest that approaches to reduce T2D through changes to the LTPA environment require local tailoring.

  • Diabetes Mellitus, Type 2
  • Primary Prevention
  • Physical Fitness

Data availability statement

Data are available on reasonable request. Data are available on reasonable request. Deidentified data are available on request with IRB approval and a data use agreement.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Inequitable access to the leisure-time physical activity environment (LTPA) could explain disparities in type 2 diabetes (T2D).


  • We evaluated whether the LTPA environment mediates the association between neighborhood socioeconomic environment (NSEE) and T2D. We found little evidence that the density of PA facilities or the distance to parks mediated NSEE-T2D associations.


  • Approaches to reduce T2D through changes to the LTPA environment likely require local tailoring.


Eleven per cent of US adults (37.3 million) had diabetes in 2019.1 The economic burden of diabetes—including costs directly related to diabetes care and increased risk of developing neurological, peripheral vascular, cardiovascular, renal, endocrine/metabolic, ophthalmic, and other complications—accounted for more than one in eight healthcare dollars spent in 2017.2 The American Diabetes Association recently recommended social determinants of health, including socioeconomic status (SES) and the neighborhood physical and built environment, as priorities for research and intervention to prevent diabetes.3

Epidemiologic studies have found graded associations between greater area-level socioeconomic disadvantage, despite differences in measurement, and higher risk of type 2 diabetes (T2D) across a variety of study populations and designs.4–8 In an analysis of three study samples in the USA, greater socioeconomic disadvantage was generally associated with higher relative risk of T2D across study samples in most community types ranging from rural to urban in the Diabetes Location, Environmental Attributes, and Disparities (LEAD) Network.9 Stark geographic disparities in the incidence and prevalence of T2D observed across regions, counties, and neighborhoods may be attributable to community-level structural causes.10–14

Features of the social and physical environment can influence disease risk by promoting or hindering healthy behaviors such as physical activity (PA) and via chronic psychosocial or physical stressors.15 Higher levels of leisure-time PA (LTPA) are well known to reduce the risk of T2D.16 Proximity to or greater availability of LTPA resources, such as PA facilities and parks, have been associated with increased PA17–20 and reduced T2D risk,21–23 although most studies have examined associations in urban and suburban communities, with rural communities less studied. Studies have also found that lower SES neighborhoods have lower access to parks and PA facilities,24–26 suggesting that inequitable access to LTPA resources may be one mechanistic pathway through which neighborhood SES affects T2D risk. No prior studies have formally examined whether access to parks or PA facilities mediates the association between lower neighborhood SES and higher risk of T2D.

Evaluation of mediating pathways between neighborhood SES and T2D raises several methodologic challenges. Measures of neighborhood SES and PA environment are defined and operationalized heterogeneously across the literature27 and could operate differently across the urban to rural continuum. With associations between the PA environment and risk of T2D varying across study design, populations, and geographies, as evidenced by systematic reviews,17–20 23 harmonized analytic approaches are necessary to reduce heterogeneity between studies and generate epidemiologic inferences that can inform population-level interventions and policies. In this study by the Diabetes LEAD Network, we conducted complementary analyses across three unique study samples in diverse geographies in the USA to evaluate whether lower access to neighborhood resources for LTPA, including distance to PA facilities or distance to parks, could partially mediate the overall harmful associations previously observed between poor neighborhood SES and risk of T2D.


Overall approach

The Diabetes LEAD Network is a Centers for Disease Control and Prevention (CDC)-funded research collaboration of the following academic centers: Drexel University, Geisinger and Johns Hopkins University, New York University Grossman School of Medicine, and the University of Alabama at Birmingham.28 The goal of the Network is to identify modifiable community-level determinants of T2D and cardiometabolic conditions using electronic health records (EHRs) and survey data from across the USA. Led by the Drexel University Data Coordinating Center, the Diabetes LEAD Network partners conducted analyses to reduce, where possible, between-study heterogeneity and aid interpretation of results. First, we created harmonized measurements of key neighborhood-level variables, including SES, PA, land use, racial/ethnic composition, and community type. Second, we developed complementary analysis plans to estimate mediation effects within three independent study populations, described in detail further, with different populations, study designs, and geographic contexts. Baseline address data were geocoded and used to link individuals to census tract-level community features.

Veterans Administration Diabetes Risk (VADR) cohort study

The VADR cohort study enrolled veterans in the national Veterans Health Administration EHR for primary care with sufficient residential address information within 2 years after cohort entry (1 January 2008).29 Participants were either (1) T2D-free at cohort entry with at least two T2D-free primary care visits at least 30 days apart within any 5 year period since 1 January 2003 or (2) subsequently enrolled and T2D free at cohort entry through 31 December 2016. Prevalent and incident T2D diagnoses were defined using encounter diagnoses, medication orders, and laboratory test results prior to and during the study period (online supplemental table S1). Subjects were followed from the date of cohort entry until censoring, defined as date of incident T2D, death, no encounters with the EHR for 2 years, or the end of the study (31 December 2018).

Supplemental material

REasons for Geographic and Racial Differences in Stroke (REGARDS) cohort study

The REGARDS study enrolled adults age 45 years and older at baseline (2003–2007) from the contiguous USA with oversampling in the Southeast.30 This analysis included 11 208 study participants without prevalent T2D at baseline and who completed a follow-up exam in 2013–2016. Incident T2D was defined as a fasting glucose ≥126 mg/dL or random glucose ≥200 mg/dL or use of T2D medication at the follow-up exam (online supplemental table S1). Follow-up time was measured as the time between baseline and follow-up home visits.

Geisinger EHR nested case–control study

Geisinger is an integrated health system that provides primary, specialty, urgent, and emergency healthcare services at community practice clinics and hospitals in central and northeastern Pennsylvania. The design of the nested case–control study of new onset T2D in the Geisinger EHR has been described previously.31 Between 2008 and 2016, individuals with T2D (n=15 888) were identified using T2D encounter diagnoses, medication orders, and laboratory test results (online supplemental table S1). We required at least two encounters on different days with a primary care provider prior to ensure that we could detect T2D if present. To exclude prevalent T2D, we required individuals to have at least one encounter with the health system without evidence of T2D at least 2 years prior to the T2D onset date. We randomly selected with replacement five control encounters for each case (n=79 435, with 65 084 unique persons) from individuals who never met any of the criteria used to define T2D cases and frequency matched to cases on age, sex, and year of encounter.

Harmonized community type

The distributions of the primary exposure of interest, area-level neighborhood socioeconomic disadvantage, and the mediator variables of interest, density of PA facilities and distance to parks, vary considerably across urban, suburban, and rural community contexts. We assigned census tracts to one of four community type categories (higher density urban, lower density urban, suburban/small town, and rural) using a previously described modification of the US Department of Agriculture Rural-Urban Commuting Area methodology to better differentiate urban cores from surrounding non-rural areas.32 We stratified all analyses by community type due to concerns about census tract-level residual confounding and non-positivity, which can lead to bias when exposure variables do not overlap within strata of confounding variables.

Harmonized neighborhood socioeconomic environment (NSEE)

As part of the Diabetes LEAD Network, we developed a harmonized definition of area-level socioeconomic disadvantage relevant across the US, hereafter referred to as neighborhood socioeconomic environment (NSEE). We defined NSEE as a z-score sum of six census tract variables (percentage of persons with less than a high-school education, persons unemployed, households earning less than $30 000/year, population with income below poverty level, households on public assistance, and occupied housing units with no cars), based on the work of Xiao and colleagues,33 from the 2000 decennial Census and 2010 5-year American Community Survey. Census data from the year 2000 were converted to 2010 tract boundaries using the interpolation tool from Brown Universities Longitudinal Tract Data Base.34 In VADR, participants who entered the cohort before 2010 were assigned NSEE based on the 2000 Census data; otherwise, they were assigned NSEE based on 2006–2010 ACS data. In REGARDS, participants were assigned NSEE based on the 2000 Census data. In Geisinger, cases and matched controls in years 2008–2012 were assigned NSEE based on 2000 Census, while cases and matched controls in years 2013–2016 were assigned NSEE based on 2006–2010 ACS data. Higher NSEE z-scores indicated greater disadvantage. To improve interpretability of comparisons within a community type and across studies, we rescaled NSEE to range from 0 to 100 and categorized NSEE separately within each community type.

Harmonized density of PA facilities within street network buffers

We defined PA facility density as the 5-year average (ending the year prior to study entry or selection) of the number of commercial facilities for fitness or recreation per square kilometer within a street network buffer centered on the population-weighted centroid of a census tract. We obtained annual data (1998–2014) on PA facilities from the Retail Environment and Cardiovascular Disease (RECVD) study,35 which classified health-related neighborhood amenities using the National Establishment Time Series (NETS) Database. Details on classification methods and the NETS data, which was based on information collected by Dun and Bradstreet (D&B, Short Hills, New Jersey, USA), has been described elsewhere.35 The RECVD team re-geocoded the data to improve locational accuracy and recategorized facilities using Standard Industrial Classification codes and word searches or name searches to enhance classification. Using ESRI ArcGIS9.3, different street network buffers were calculated depending on the LEAD community type: 1 mile (1.6 km) walking buffer in high density urban, 2 mile (3.2 km) driving buffer in lower density urban, and 6 mile (9.7 km) driving buffers in suburban/small town and rural. Buffer sizes were selected based on prior literature measuring the commercial PA facility environment, although few past studies could inform appropriate buffer sizes in rural areas.36–40

Harmonized population-weighted distance to seven closest parks

We defined neighborhood spatial access to parks for each census tract using a population-weighted measure of the closest seven parks, as described previously.25 41 Briefly, this measure accounts for distance (miles) and size of the seven closest parks while adjusting for the heterogeneous population distribution within a census tract. Data on national, state, and local parks in 2010, provided by the CDC Division of Population Health, were derived from two sources: The Homeland Security Infrastructure Program Gold database and ESRI ArcGIS 10.1 Data DVD.

Harmonized census tract-level covariates

We measured percent Hispanic and percent non-Hispanic Black and land use environment within each census tract as potential confounders. We obtained race/ethnicity data from the 2000 and 2010 decennial census. To measure the land use environment, we used factor scores from a multiple group confirmatory factor analysis, stratified by community type, which identified a single latent variable from seven components of the built environment: average block length, average block size, intersection density, street connectivity, establishment density, percent developed land, and household density. Data were derived from ESRI 2009 Vintage Street and computed via ArcGIS Pro 2.3. Details on the land use measurement model have been described previously.42 The z-score sums within each community type were scaled to range from 0 to 100, with a higher value indicating a more walkable neighborhood.

Individual-level covariates

Individual-level covariates, including age, sex, race and ethnicity, smoking status, and available proxies for individual or family SES, were defined within each study. Age was defined at baseline in VADR and REGARDS or at the time of case or control selection in Geisinger. We considered race/ethnicity as a proxy for race-based discrimination, a social construct that is correlated with social determinants of health. Categories of race/ethnicity were defined within each study (VADR: non-Hispanic White, non-Hispanic Black, Hispanic, Asian, and other/unknown; REGARDS: non-Hispanic White vs non-Hispanic Black; Geisinger: non-Hispanic Black, non-Hispanic White, or other race/ethnicity). SES was defined by baseline data on individual disability and income in VADR (disabled, low-income/non-disabled, or neither), as baseline household income in REGARDS (<$20 000, $20 000–$34 999, $35 000–$74 999, ≥$75 000, or refused to answer) and by receipt of medical assistance prior to case or control selection (0% vs >0% of time prior to case or control selection) in Geisinger.

Statistical analysis

Using a counterfactual framework, we estimated the average natural direct effect of NSEE on risk of T2D, not operating through the two hypothesized mediators, and the average natural indirect effect of NSEE on risk of T2D operating through the LTPA environment, adjusted a priori for confounding variables at the individual-level and census tract-level (online supplemental figure S1). When interpretable (ie, when direct and indirect effects were in the same direction), we calculated the proportion mediated (ie, the proportion of the total effect operating through the indirect pathway of interest).43 No sample size calculations were conducted.

In the VADR cohort study, we first examined exposure–outcome and mediator–outcome associations using piecewise exponential survival models44 with 2-year intervals and county random effects fit in R (V.4.04) using the ‘lme4’ R package.45 County random effects were used because of convergence issues when clustering by census tract. We used generalized linear mixed effects Poisson regression models with a log link function and an offset of logarithm of time-at-risk during each interval to estimate HRs assuming a constant hazard function within intervals over time. We estimated total, direct, and indirect effects46 as differences in 2-year T2D incidence rates, with quasi-Bayesian approximation 95% CIs, at each NSEE quartile compared with the first quartile, using the ‘mediation’ R package.47 Differences in incidence rates were multiplied by 100 for interpretability.

In the REGARDS cohort study, we examined exposure–outcome associations using Poisson mixed models with robust variance estimation to account for correlation of participants within census tracts, and exposure–mediator associations using linear mixed models with robust variance estimation in R (V.4.04) using the ‘lme4’ package.45 We estimated total, direct, and indirect effects46 as differences in T2D incidence rates, with bootstrapped 95% CIs, at each NSEE quartile compared with the first quartile, using the ‘mediation’ R package.47 Differences in incidence rates were multiplied by 100 for interpretability.

In the Geisinger EHR case–control study, we estimated total, direct, and indirect effects using logistic regression models as ORs, with bootstrapped 95% CIs,48 at each NSEE quartile compared with the first quartile, in R (VV.4.04) using the ‘medflex’” R package49 in order to fit models appropriate for our study design and to use tract-level bootstrap resampling to account for clustering within census tracts.

Final models were adjusted for Network-harmonized census tract-level race/ethnicity and land use environment (factor score) and study-specific age, sex, race/ethnicity, smoking status (REGARDS and Geisinger only), and SES. We treated age as a continuous variable with linear and quadratic terms. All VADR models included linear and quadratic age terms, while REGARDS and Geisinger models included quadratic age if the quadratic term was statistically significant (p<0.05). In VADR, we did not adjust for smoking status because of the large proportion of missingness (66%) at cohort entry.

Sensitivity analyses

We conducted sensitivity analyses to evaluate the robustness of the main effects. First, we refitted mediation models for PA facility density with larger street network buffers (2 mile driving buffers in higher density urban, 6 mile driving buffers in lower density urban, and 10 mile driving buffers in suburban/small town and rural communities). Second, we removed land use environment from models because this variable could be on the causal pathway. Third, we took advantage of the national VADR cohort to restrict VADR models to census tracts shared with either REGARDS or Geisinger. Although the effects across studies cannot be quantitatively compared due to differences in study design and analytic approaches, we qualitatively evaluated the consistency (eg, presence vs absence) of mediation results. We hypothesized that comparisons between primary mediation results from REGARDS or Geisinger and the geographically restricted VADR models, which compared different populations within the same census tracts, could suggest whether the observed discrepancies in primary findings across sites could partly be attributed to differences in geographic context or in population composition. Fourth, we conducted a post hoc analysis in the REGARDS rural community model in which we treated the PA facility density mediator as a binary variable, dichotomized at the median, to investigate whether the observed associations in REGARDS rural communities of mediation via PA facility density could be due to a misspecification of the PA facility density mediator as a continuous variable.


Participant characteristics

Demographic characteristics and geographic coverage of participants in the three study samples are summarized in table 1 (individual level) and table 2 (tract level). Briefly, the VADR cohort included more than 4.1 million veterans (539 369 incident T2D cases) in 71 835 census tracts across most of the continental USA. The REGARDS cohort included 11 208 participants (1409 incident T2D cases) in 7502 census tracts, of which roughly half were in the southeast. The Geisinger nested case–control study sample included 15 888 new onset T2D cases and 79 435 controls in 785 census tracts in northeastern and central Pennsylvania. The VADR and REGARDS cohorts had a median follow-up of 5.0 and 9.5 years, respectively, while the median contact in the Geisinger EHR was 11.2 years.

Table 1

Baseline individual-level characteristics of three independent study samples in the Diabetes LEAD Network

Table 2

Baseline census tract-level characteristics of three independent study samples in the Diabetes LEAD Network

The Geisinger and REGARDS study samples had approximately equal proportions of males and females. The VADR cohort, reflecting overall veteran demographics, was 7.8% female. The average REGARDS participant was slightly older compared with VADR or Geisinger participants (63.0 vs 59.4 and 54.9 years, respectively). The REGARDS cohort had a greater proportion of non-Hispanic Black participants (32.8%) compared with the VADR and Geisinger study samples (16.0% and 1.8%, respectively). Within community type, median NSEE values did not vary substantially across study samples (online supplemental figure S2). The VADR cohort, the most geographically diverse study sample, had a greater range of NSEE values and longer upper tail (worse socioeconomic disadvantage) compared with the other study samples. PA facility density generally declined from urban to rural community types, while distance to parks generally increased (online supplemental figures S3 and S4, respectively). There was greater variability in the two LTPA environment measures in the VADR cohort compared with the other study samples.

Mediation via PA facility density

We found some limited evidence for mediation of the effect of NSEE on T2D via PA facility density in the VADR cohort, although of small magnitude, and in the REGARDS cohort, while there was no evidence of mediation in the Geisinger sample (table 3). The most consistent effects were observed in rural communities, where the indirect effects were positive and strengthened across increasing quartiles of NSEE, and where the proportion mediated was 3% in VADR and 51% in REGARDS (comparing the fourth vs first quartile of NSEE). In all other community types, we observed inconsistent exposure–outcome relations and directions of the indirect effects. In suburban/small town communities, we observed negative indirect effects in VADR (ie, inconsistent mediation opposing the direct effect) and positive indirect effects in REGARDS (ie, mediation in the same direction as the direct effect), although in the REGARDS cohort, the positive significant indirect effect was limited to the NSEE second vs first quartile only. In urban communities, we observed significant, although less consistent, indirect effects in the VADR cohort only. In VADR lower density urban communities, positive indirect effects in the second and third (vs first) quartiles of NSEE represented a proportion mediated of about 1%, whereas in higher density urban communities, indirect effects were negative.

Table 3

Total, direct, and indirect effects of higher NSEE (higher socioeconomic disadvantage) on T2D via PA facility density within street network buffers

Mediation via distance to parks

We found limited evidence for mediation of the effect of NSEE on T2D via distance to parks in the VADR cohort only (table 4), and specifically, in rural communities. In these communities, the population-weighted distance to the seven closest parks mediated a small proportion (<1%) of the total effect of NSEE on T2D comparing the NSEE fourth vresus first quartiles. In the VADR cohort, there was minimal evidence for mediation in suburban/small town and lower density communities and no evidence for mediation in higher density urban communities.

Table 4

Total, direct, and indirect effects of higher NSEE (higher socioeconomic disadvantage) on T2D via population-weighted distance to seven nearest parks

Sensitivity analyses

In analyses using larger buffer sizes to define PA facility density, results for REGARDS, but not VADR or Geisinger, differed from the primary analysis (online supplemental table S2). In REGARDS, using larger buffers resulted in slightly larger indirect effect sizes in lower density urban, suburban/small town, and rural communities. Removing the land use environment measure from mediation models did not substantially affect the magnitude or direction of the indirect effects (results not shown). When we restricted VADR analyses to REGARDS census tracts, we no longer observed any statistically significant indirect effects via PA facility density that were present in the REGARDS suburban/small town and rural communities (online supplemental table S3). For the corresponding analysis examining mediation via distance to parks, we did not observe substantial qualitative differences compared with the main REGARDS analysis (online supplemental table S3). In VADR models restricted to Geisinger census tracts, the indirect effects via both PA facilities and distance to parks (online supplemental table S4) were not qualitatively different than the Geisinger main analysis (ie, close to null). In the sensitivity analysis examining the PA facility density mediator as a binary variable, the indirect association in REGARDS rural communities was slightly attenuated in magnitude but remained statistically significant (online supplemental table S5).


In a causal mediation analysis of three independent study samples, with harmonized exposure and neighborhood-level confounding variables, and geographic coverage across rural, suburban, and urban communities in the US, we found little evidence that the LTPA environment mediates the association between NSEE and new onset of T2D. We observed some statistically significant mediation via PA facility density and distance to parks; however, the magnitude of the indirect effects was small, especially in the VADR cohort, and in most models, the LTPA environment mediated only a small proportion of the effect of NSEE on T2D. Overall, the observed indirect effects through the LTPA environment are of insufficient magnitude to inform policy decisions aimed at reducing the impact of NSEE on T2D. Despite the lack of evidence for policy-relevant indirect effects via the LTPA environment, our heterogeneous mediation findings by community type affirm that future research should consider community type as a key modifying factor by which neighborhood context affects T2D.

Mediation via PA facility density

In non-rural community types, findings were inconsistent in direction and study sample and provided little evidence that PA facility density explains NSEE’s harmful effects on T2D. We found some evidence for mediation of NSEE on T2D via PA facility density in rural communities—small magnitude and small proportion mediated in VADR and a relatively large proportion mediated in REGARDS—but this finding warrants caution in its interpretation. Although no prior studies of mediation of NSEE-T2D associations via access to PA facilities exist, evidence for a relation between neighborhood SES and access to PA facilities24 26 50 51 and between access to PA facilities and T2D21 23 52 support the plausibility of this mediation pathway. Given that most studies of PA facility access and T2D risk have been limited to urban and suburban areas, our study’s finding of mediation in rural communities provides some evidence that the harmful effect of NSEE on T2D risk may be in part due to low access to PA facilities in rural communities, which have fewer PA facilities compared with more urbanized areas.26 However, given that we only observed a substantial proportion mediated in one study sample and the limitations in the supporting data, this finding should be interpreted with caution. In REGARDS rural communities, the distribution of PA facility density was relatively narrow and skewed (ie, many census tracts had no PA facilities), especially among tracts with high NSEE. In addition, confounding by individual-level factors is more likely when a study population has few persons per census tract; of the three study samples, the number of persons per census tract was lowest in REGARDS. To address concerns about the skewed distribution of PA facility density, we conducted a sensitivity analysis in which PA facility density was treated as a binary, rather than continuous, mediator. Indirect effects remained statistically significant, suggesting this association was not due to misspecification of the exposure–mediator or mediator–outcome exposure–response relation. While this bolsters evidence for the mediation finding, our sensitivity analysis restricting the VADR analysis to overlapping census tracts with the other study samples suggested that the mediation effects we observed in REGARDS may have been due to differences in population composition rather than contextual differences. The lack of mediation effects observed for Geisinger remains unclear but could potentially reflect overall regional differences in T2D risk12 or the lower T2D risk in rural areas within the Geisinger study region.31

In our study’s main analysis, we used street network buffer sizes for PA facility density in the two types of urban communities that were comparable with past studies of PA facility density and T2D, which have typically used 1-mile circular buffers.21 53 With limited prior studies on which to base our selection of street network buffers in rural areas, we selected 6-mile driving buffers in suburban/small town and rural communities. In a sensitivity analysis, we found that expanding the size of the buffers used to measure PA facility density resulted in slightly larger indirect effect sizes in REGARDS in most community types, suggesting that the relevant activity space for measuring LTPA resources may be larger in less dense, car dependent communities. Alternatively, using larger buffer sizes may have captured additional PA facilities in adjacent communities, increasing the effect sizes observed for the REGARDS cohort. There is no consensus in the literature on the most relevant activity spaces for LTPA across the rural to urban continuum. Empirical studies measuring how activity spaces relevant for LTPA differ across rural, suburban, and urban areas are an important area for future research. In addition, future mediation analyses should evaluate other aspects of the LTPA environment, such as the role of quality or affordability of PA facility access.

Mediation via distance to parks

We found only a small positive indirect effect (<1% mediation) via population-weighted distance to the seven closest parks that was limited to VADR rural communities. No prior studies have evaluated whether access to parks mediates the relation between NSEE and T2D; however, mediation via park access is plausible based on evidence of associations between NSEE and park access and between park access and T2D. As previously shown, higher poverty tracts in rural areas were farther from parks—as measured by the same definition in our study, the population-weighted distance from census tract centroid to the seven closest parks—whereas in urban and suburban areas, higher poverty tracts were closer to parks.25 Although few studies have specifically examined park access, similar exposures such as greenspace and public open spaces have been associated with lower risk of T2D.54 Our measure of park access may not influence individual LTPA as we hypothesized. Despite the strengths of our distance to parks measure, which accounted for the population distribution within a census tract, the size of the parks, and was unconstrained by tract boundaries, our largely null mediation results could indicate that this may not be an appropriate measure of park access for these populations or geographies. A systematic review of the relation between proximity and density of parks and LTPA found inconsistent associations across studies, which suggests that perceived measures of park access—as opposed to objective measures used in our study—may have more relevance to LTPA.20 Other features of the LTPA environment that could be more relevant to increasing LTPA, such as affordability, features, and quality, should be examined in future studies of mediation via the LTPA environment.21 26


This study is the first to employ a formal causal mediation framework to examine whether features of the LTPA environment mediate the observed harmful association between NSEE and T2D. The main strengths of this analysis, made possible through collaboration of the Diabetes LEAD Network, was the development of complementary analysis plans across sites, including creating harmonized analytic variables (exposure and key neighborhood-level variables) and conducting statistical modeling as consistently as possible, despite the differences in study designs. This allowed us to address often-cited limitations in the comparison of epidemiologic study findings, namely, the extent to which methodological heterogeneity contributes to lack of replication across studies. Furthermore, our study samples represented both national and region-specific contexts and a broad spectrum of community types, which allowed exploration of heterogeneity across these dimensions of interest.


Although our methodological approach eliminated important sources of variation between the three analyses, replication in different study samples and geographies does not allow for a full disentanglement of contextual and composition effects. Our study samples differed in their distributions of race/ethnicity, sex, age, SES, veteran status, and in the geographic regions covered. Residual confounding may have remained, such as from individual-level SES, which was measured differently in the three study samples. We were unable to completely harmonize the diagnostic criteria for T2D across samples, which could have contributed to the heterogeneity of results across studies. For some community types, small sample sizes for particular combinations of NSEE quartiles and PA mediators in the REGARDS and Geisinger study samples may have constrained our ability to detect the small mediation effects in the substantially larger VADR sample, but the small effect sizes detected in the VADR cohort may not be meaningful for informing policy or interventions to reduce the impact of NSEE on T2D. The LTPA environment variables were defined from population-weighted centroids, rather than individual addresses, which may not capture the relevant activity space of participants. For causal inference, these mediation analyses assume no unmeasured exposure–outcome confounding, exposure–mediator confounding, or mediator–outcome confounding. Stratifying statistical models by community type and evaluating NSEE as within-community quartiles reduced potential bias from census tract-level residual confounding and non-positivity, but this also restricted our ability to interpret mediation effects across community types. With cross-sectional address data, we could not address possible bias from residential self-selection, where health-related behaviors and preferences predict neighborhood choice and health outcomes.


In the first causal mediation analysis of the NSEE-T2D association through neighborhood LTPA resources, we found little evidence for partial mediation via the density of PA facilities or distance to parks in three unique study samples. Given the small magnitude of the indirect effects observed, alongside the heterogeneity by study sample and community type, the relevance of our findings for policy or interventions to reduce T2D is not straightforward. Our differential findings by community type highlight the importance of considering community type as a key factor in the contextual impacts on T2D, and by extension, suggest that approaches to change the LTPA environment to reduce T2D will require local tailoring to maximize population health benefits. Inequities in local destinations for walking, social engagement, and PA are widening across the socioeconomic gradient and by race and ethnicity in the USA.55 Our understanding of the best approaches to measure the LTPA environment in rural communities is particularly limited. Research evaluating alternative measures of the LTPA environment is needed, particularly methods that capture features beyond proximity and using multiple spatial scales.

Data availability statement

Data are available on reasonable request. Data are available on reasonable request. Deidentified data are available on request with IRB approval and a data use agreement.

Ethics statements

Patient consent for publication

Ethics approval

The study was approved by each institution’s institutional review boards. Geisinger Institutional Review Board (2017-0534), Drexel University Institutional Review Board (1707005552), New York University School of Medicine Institutional Review Board (i17-01428), Subcommittee for Human Studies, Research and Development Committee, Department of Veterans Affairs, New York Harbor Healthcare System (01667), University of Alabama at Birmingham (IRB-300000957). The Geisinger Institutional Review Board approved this study and waived informed consent. Data were collected for non-research purposes, and consent was waived because it would be a burden on the patient if they would have to come into a visit to review and sign a consent, as some patients are not seen regularly. It would be impractical for both the patient as well as study staff to contact thousands of patients to participate in the study. For NYU and the VADR cohort, the request for waiver or alteration of the informed consent requirement for all aspects of the study was approved. All REGARDS participants provided written informed consent and signed medical record release forms allowing REGARDS investigators to retrieve medical records for research purposes.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Contributors KM:

    Accepts full responsibility for the finished work and/or the conduct of the study, had access to the data, and controlled the decision to publish;

    KM and SO: writing (original draft, review and editing draft), statistical analysis and interpretation of statistical analysis; CN: writing (review and editing draft), statistical analysis and interpretation of statistical analysis; AZ, JU, PL and VR: writing (review and editing draft), statistical analysis and interpretation of results; MS: writing (review and editing draft); AGH, BSS, APC, and DLL: conceptualization, funding acquisition, writing (review and editing draft), statistical analysis and interpretation of results; MM: writing (review and editing draft), data curation; JB: writing (review and editing draft); GSL: conceptualization, funding acquisition, writing (review and editing draft); SA: writing (review and editing draft); RK: writing (review and editing draft); SA: writing (review and editing draft); GI: writing (review and editing draft); MP: conceptualization; funding acquisition, writing (review and editing draft), statistical analysis and interpretation of results.

  • Funding This research was conducted by the Diabetes LEAD Network, funded by the Centers for Disease Control cooperative agreements U01DP006293 (Drexel University), U01DP006296 (Geisinger-Johns Hopkins University), U01DP006299 (New York University School of Medicine), and U01DP006302 (University of Alabama at Birmingham), along with collaboration with the US CDC Division of Diabetes Translation. This research used data created by The Retail Environment and Cardiovascular Disease (RECVD) study, which was supported by the National Institute on Aging (NIA) (1R01AG049970, 3R01AG049970-04S1), Commonwealth Universal Research Enhancement (C.U.R.E) program funded by the Pennsylvania Department of Health – 2015 Formula award – SAP #4100072543, the Urban Health Collaborative at Drexel University, and the Built Environment and Health Research Group at Columbia University. The REasons for Geographic and Racial Differences in Stroke (REGARDS) project was supported by cooperative agreement U01 NS041588 cofunded by the National Institute of Neurological Disorders and Stroke and the NIA, National Institutes of Health, Department of Health and Human Service.

  • Disclaimer The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

  • Competing interests DLL received investigator-initiated research support from Amgen, Inc for work unrelated to this manuscript. All other authors declare that they have no conflicts of interest.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.