Article Text

Application of machine learning techniques to understand ethnic differences and risk factors for incident chronic kidney disease in Asians
1. Cynthia Ciwei Lim1,
2. Feng He2,
3. Jialiang Li3,
4. Yih Chung Tham2,4,
5. Chieh Suai Tan1,
6. Ching-Yu Cheng2,4,
7. Tien-Yin Wong2,4,5,
8. Charumathi Sabanayagam2,4,5
1. 1Department of Renal Medicine, Singapore General Hospital, Singapore
2. 2Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
3. 3Department of Statistics and Applied Probability, National University of Singapore, Singapore
4. 4Ophthalmology and Visual Sciences Academic Clinical Program, Duke-NUS Medical School, Singapore
5. 5Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
1. Correspondence to Dr Charumathi Sabanayagam; charumathi.sabanayagam{at}seri.com.sg

## Abstract

Introduction Chronic kidney disease (CKD) is increasing in Asia, but there are sparse data on incident CKD among different ethnic groups. We aimed to describe the incidence and risk factors associated with CKD in the three major ethnic groups in Asia: Chinese, Malays and Indians.

Research design and methods Prospective cohort study of 5580 general population participants age 40–80 years (2234 Chinese, 1474 Malays and 1872 Indians) who completed both baseline and 6-year follow-up visits. Incident CKD was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 in those free of CKD at baseline.

Results The 6-year incidence of CKD was highest among Malays (10.0%), followed by Chinese (6.1%) and Indians (5.8%). Logistic regression showed that older age, diabetes, higher systolic blood pressure and lower eGFR were independently associated with incident CKD in all three ethnic groups, while hypertension and cardiovascular disease were independently associated with incident CKD only in Malays. The same factors were identified by machine learning approaches, gradient boosted machine and random forest to be the most important for incident CKD. Adjustment for clinical and socioeconomic factors reduced the excess incidence in Malays by 60% compared with Chinese but only 13% compared with Indians.

Conclusion Incidence of CKD is high among the main Asian ethnic groups in Singapore, ranging between 6% and 10% over 6 years; differences were partially explained by clinical and socioeconomic factors.

• cohort studies
• ethnicity
• renal insufficiency
• chronic
• kidney diseases

## Data availability statement

Data are available upon reasonable request. As the study involves human participants, the data cannot be made freely available in the manuscript, the supplemental files, or a public repository due to ethical restrictions. Nevertheless, the data are available from the Singapore Eye Research Institutional Ethics Committee for researchers who meet the criteria for access to confidential data. Interested researchers can send data access requests to the Singapore Eye Research Institute using the following email address: seri@seri.com.sg.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

## Statistics from Altmetric.com

### Significance of this study

• Aging populations and greater prevalence of metabolic risk factors such as diabetes and hypertension have contributed to increased CKD but it is unknown if there are disparities in the risk and contributory factors for incident chronic kidney disease (CKD) in the major ethnic groups in Asia.

#### What are the new findings?

• The 6-year incidence of CKD in Chinese, Malays and Indians was 6.1%, 10% and 5.8%, respectively. Older age, diabetes, higher systolic blood pressure and lower eGFR were independently associated with incident CKD in all ethnicities, while hypertension and cardiovascular disease were associated with incident CKD only in Malays.

#### How might these results change the focus of research or clinical practice?

• Significant ethnic disparities in incident CKD in Asians were partially explained by clinical and socioeconomic factors that can be targeted to reduce incident CKD.

## Introduction

Chronic kidney disease (CKD) is recognized to be a global health burden.1 Aging populations and greater prevalence of metabolic risk factors such as diabetes and hypertension have contributed to increased CKD worldwide and especially in Asia.2 3 In the Global Burden of Disease Study 2017,2 global life expectancy increased by 7.4 years from 65.6 years in 1990 to 73.0 years in 2017. CKD contributed to one of the largest increases in disability-adjusted life years,2 and CKD-related deaths increased by 40% among those aged 50–69 years and by 42% among those 70 years and older.4 CKD is also costly to patients and the society.5 The median values of total direct and out-of-pocket healthcare expenditures were $12 877 and$1439, respectively, among individuals with CKD in the USA,5 more than five times the expenditures of those without CKD. Hence, there is a need to establish incidence of CKD in the general population to better anticipate and prepare for the challenges that CKD brings to the healthcare system. Annualized incident CKD rates among the general population are estimated to be 0.7%–1.2% in North America and Europe,6–9 with fewer studies in Asia. While some East Asian countries such as Taiwan, Korea and Japan have reported rates of 0.9%–2%,10–13 there are scant data in other ethnic groups such as Malays. Other than a hospital-based study of type 2 diabetes that evaluated CKD progression,14 data on incident CKD in Singapore are sparse.

An earlier cross-sectional study reported CKD prevalence (higher in Malays and Indians) to be different among the three ethnic groups in Singapore.15 Ethnic disparities (eg, white vs Hispanic and black people) in CKD prevalence have also been reported in North America,16 where they were attributed to genetic differences and/or socioeconomic barriers to accessing healthcare.16 17 However, CKD risk and its contributory factors appear to differ between Asian and Caucasian populations so previous studies may not be generalizable.18 There is growing recognition that in order to address these disparities in health outcomes, there is a need for a culturally competent healthcare system that first acknowledges the differences and the contributory reasons and then adapts services to meet the unique needs of the population.17 To address these gaps, we aimed to describe and compare the incidence and factors associated with incident CKD in the three major ethnic groups in Asia and Singapore: Chinese, Malays and Indians. Furthermore, to better understand risk factors for incident CKD, and possible ethnic differences, we applied machine learning techniques in addition to standard logistic regression (LR) models.

## Research design and methods

### Study population

The Singapore Epidemiology of Eye Diseases (SEED) Study is a large population-based prospective cohort study of Chinese, Malay and Indian adults aged 40–80 years at baseline.19 Three independent studies, the Singapore Malay Eye Study (2004–2006), the Singapore Indian Eye Study (2007–2009) and the Singapore Chinese Eye Study (2009–2011) conducted by the Singapore Eye Research Institute were combined. Detailed methodology for these studies was previously reported.20–22 In brief, age-stratified random sampling from computer-generated random lists of individuals 40–80 years of age residing in the same geographical area in Singapore generated a sampling frame of 6350 Chinese, 5600 Malays and 6350 Indians. A total of 10 033 participants comprising 3353 Chinese, 3280 Malays and 3400 Indians participated in the baseline visit and 6762 (78.8%) returned for the follow- up visit.19 For this study, we included participants who attended both baseline and 6-year follow-up visits. After excluding those with missing values on estimated glomerular filtration rate (eGFR) at baseline or follow-up (n=597), those with prevalent CKD at baseline (n=524) and those with missing data on key covariates including hypertension, body mass index (BMI), lipid profile, current smoking status, alcohol consumption and education category (n=61), 5580 SEED participants were included for the current analysis. Figure 1 shows the selection of the SEED participants included in the analysis.

Figure 1

Flow diagram of participant exclusion. CKD, chronic kidney disease; eGFR, estimated glomerular filtration rate.

### Data collection

An interviewer-administered questionnaire was used to collect participants’ sociodemographic (age, gender), socioeconomic (highest education attained), lifestyle (current smoking, alcohol consumption) and medical history as previously described.23 Physical examination included height, weight and blood pressure (BP) measurements. We calculated BMI as weight in kilograms divided by height in meters squared. Obesity was defined as BMI ≥25 kg/m2. Hypertension was defined in the presence of systolic BP ≥140 mm Hg, diastolic BP ≥90 mm Hg; participants reported hypertension diagnosed by physicians or use of BP lowering therapy. Among those with hypertension, BP control was defined as having BP <140/90 mm Hg.24 Diabetes mellitus was defined as random serum glucose level ≥11.1 mmol/L, glycosylated hemoglobin (HbA1c)≥6.5%; participants reported diabetes diagnosed by physicians or use of glucose-lowering treatment.25 Among those with diabetes, glycemic control was defined as HbA1c <7%.26 Cardiovascular disease was defined as self-reported myocardial infarction, angina, or stroke. Non-fasting serum lipid, glucose, HbA1c and creatinine were evaluated. Dyslipidemia was defined as total cholesterol ≥6.2 mmol/L, low-density lipoprotein cholesterol ≥4.1 mmol/L, and high-density lipoprotein cholesterol <1.0 mmol/L; self-reported physician-diagnosed dyslipidemia; or use of statin medication. Serum creatinine was measured using an enzymatic method calibrated to the National Institute of Standard and Technology liquid chromatography isotope dilution mass spectrometry method.19 eGFR was calculated using the CKD Epidemiology Collaboration (CKD-EPI) equation.27 Laboratory investigations were conducted at hospitals accredited by the College of American Pathologists.

Participants gave written informed consent before enrolment.

### Outcome definition

Incident CKD was defined when individuals with eGFR ≥60 mL/min/1.73 m2 at enrollment subsequently had eGFR <60 mL/min/1.73 m2 at follow-up. The reduction in eGFR at follow-up was calculated as a percentage of the baseline eGFR at enrollment, that is, ((eGFR at baseline – eGFR at follow-up) / eGFR at baseline) *100%.

### Statistical analysis

Baseline characteristics by ethnicity and incident CKD status were examined using means (SD), median (IQR) or count (percentage) and compared using Mann-Whitney test (ie, two-sample Wilcoxon test) or Fisher’s exact test as appropriate for the variable. Sociodemographic and clinical characteristics by ethnicity at baseline and follow-up were compared using Kruskal-Wallis rank sum test or χ2 test as appropriate for the variable. LR was used to calculate the age-adjusted and sex-adjusted and multivariable-adjusted ORs and 95% CIs for factors associated with incident CKD in each ethnic group, while linear regression was used to evaluate factors associated with the continuous outcome of percentage reduction in eGFR. Covariates were selected based on established prognostic factors according to known literature.28 To account for attrition bias, we performed a supplementary analysis using inverse probability weighting (IPW) and obtained weighted regression coefficients for comparison with the unweighted ones. Statistical significance was defined to be two-sided p values <0.05. To further validate the findings in LR and to evaluate the importance of each risk factor for incident CKD, we employed two classic machine learning approaches, gradient boosted machine (GBM),29 and random forest (RF).30 In GBM, the relative influence score measures the proportional contribution of a variable on the model performance, with all scores sum up to 100%.29 In RF, the mean decrease in accuracy measures the change in the prediction accuracy resulted from the exclusion (or permutation) of a variable.30

To evaluate the extent that clinical, metabolic (cardiovascular disease, dyslipidemia, diabetes, hypertension, systolic BP), socioeconomic (education) and behavioral (smoking, obesity, diabetes control, BP control) factors may account for the excess CKD risk in the Malay cohort, we calculated the reduction in ORs associated with adjustment for these factors using the formula15

where is the OR of incident CKD in Malays versus Chinese and Malays versus Indians, adjusted for age and sex only (model 1), and is the OR of further adjusted models 2 (model 1 with additional clinical and metabolic factors), 3 (model 1 with additional socioeconomic and behavioral factors) and 4 (included all factors).

Age-standardized prevalence of risk factors and age-standardized CKD incidence were estimated using the population distribution of the 2010 Singapore Census (only included Chinese, Malays, and Indians, who were Singapore citizens or permanent residents of age 40–80 years). Annual incidence was calculated by dividing the cumulative incidence by the summed person-years.

To establish the proportion of all cases of incident CKD in the total population that could be attributed to the exposure to the binary risk factors that were significant in the multivariable model, we estimated the population attributable risks (PARs) due to hypertension, diabetes and cardiovascular disease using Levin’s formula:

where the relative risk was estimated by the adjusted OR.31

All analyses were performed using R V.4.0.0 (R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/).

## Results

We identified 5580 individuals (1474 Malays, 2234 Chinese and 1872 Indians) with eGFR ≥60 mL/min/1.73 m2 at baseline. The mean eGFR values at baseline were lower in Malays (82.9 mL/min/1.73 m2) compared with Chinese and Indians (92.8 mL/min/1.73 m2 and 91.5 mL/min/1.73 m2, respectively). The median follow-up was 6.7 (5.7–7.3) years in Malays, 6.0 (5.3–6.5) years in Chinese and 5.9 (5.5–6.6) years in Indians. Metabolic risk factors such as diabetes, hypertension and dyslipidemia were more frequent at follow-up than at baseline in all ethnic groups (online supplemental etable 1), but the relative distributions among the ethnic groups remained similar. At both baseline and follow-up, Malays were more likely to be current smokers, have obesity, hypertension with higher systolic and diastolic BP and lower eGFR compared with other groups, while Indians were more likely to have diabetes, dyslipidemia and have higher glucose levels than other ethnic groups. Among those with diabetes at baseline, Malays were less likely to have adequate diabetic control and have antidiabetic medications, compared with the other ethnic groups. Among those with hypertension at baseline, Malays were less likely to have BP control and have antihypertensive medications including ACE inhibitors or angiotensin II receptor blockers, compared with other ethnic groups.

### Supplemental material

The 6 year incidence of CKD was higher in Malays (10.0%) followed by Chinese (6.1%) and Indians (5.8%). Consequently, the age-standardized annual incidence was significantly higher in Malays and lower in Chinese and Indians (online supplemental etable 2). Incident CKD in Malays was more severe, with reduced eGFR <30 mL/min/1.73 m2 in 1.2% compared with 0.2% in Chinese and 0.2% in Indians. Figure 2 shows that the median reduction in eGFR was significantly greater in Malays (33.0%, IQR 22.1%–47.2%) compared with Chinese (28.2%, IQR 18.7%–37.9%, p value=0.004) or Indians (27.0%, IQR 17.6%–37.6%, p value=0.009).

Figure 2

Median reduction in estimated glomerular filtration rate (eGFR) was significantly greater in Malays (33.0%, IQR 22.1%–47.2%) compared with Chinese (28.2%, IQR 18.7%–37.9%) or Indians (27.0%, IQR 17.6%–37.6%).

Table 1 shows the clinical characteristics at baseline stratified by incident CKD in each ethnic group. Compared with those without incidence CKD, those with incident CKD were older, had lower education level and eGFR but higher blood glucose, systolic BP and pulse pressure. Diabetes, hypertension, cardiovascular disease, were more frequent in those with incident CKD in all three ethnic groups. Chinese and Malays with incident CKD were more likely to be male, obese and had higher BMI and lower high-density lipoprotein (HDL)-cholesterol than those without incident CKD. Chinese and Indians with incident CKD were more likely to have dyslipidemia and lower total cholesterol than those without incident CKD. In LR models stratified by ethnicity (table 2), older age, diabetes, higher systolic BP and lower eGFR were independently associated with incident CKD in all three ethnic groups, while hypertension and cardiovascular disease were independently associated with incident CKD only in Malays. As ‘hypertension’ was a broad categorical variable, there may be residual confounding by BP. Online supplemental etable 3 shows that the exclusion of systolic BP from the LR model resulted in higher adjusted ORs for hypertension in all ethnicities, while the adjusted ORs for the other predictors were similar to the LR model that included systolic BP. In the linear regression model (online supplemental etable 4), older age, diabetes, higher systolic BP and lower eGFR remained consistently associated with greater reduction in eGFR in all three ethnic groups. The magnitude of the association between systolic BP and percentage reduction in eGFR was largest in Malays among the ethnic groups. In addition, cardiovascular disease was independently associated with greater percentage reduction in eGFR in all three groups, while male gender was significantly associated with greater percentage reduction in eGFR among Chinese and Malays. Supplementary analysis with IPW identified the same factors with similar risk estimates for both incident CKD and eGFR reduction (data not shown). Both GBM and RF identified eGFR, age, systolic BP and diabetes to be the most important variables in incident CKD prediction (online supplemental efigure 1). GBM also found hypertension and cardiovascular disease to be influential in Malays.

Table 1

Baseline characteristics of SEED participants stratified by ethnicity and incident CKD status

Table 2

Multivariable predictors of incident CKD by ethnicity (n=5580)

The estimated PAR of diabetes for incident CKD was highest among Indians (45.2%) compared with Malays (35.4%) and Chinese (33.2%) based on age-standardized prevalences among those aged 40–80 years (online supplemental etable 5). Among Malays, hypertension had higher PAR (54.7%) than diabetes and cardiovascular disease (7.5%). Table 3 shows that the odds of incident CKD were markedly attenuated after adjustment for clinical, metabolic, socioeconomic and behavioural factors when comparing Malays and Chinese, and to a lesser extent when comparing Malays and Indians. Adjustment for all factors reduced the excess incidence in Malays by 64% compared with Chinese but only 19% compared with Indians.

Table 3

Factors affecting the excess incidence of CKD in Malays and Indians compared with Chinese

In the subgroup of 1338 individuals with diabetes, incident diabetic CKD occurred in 208 (15.5%). Like the main analysis, incident diabetic CKD was most frequent in Malays (20.7%), compared with Chinese (17.3%) and Indians with diabetes (11.5%). Online supplemental etable 6 shows that lower eGFR was an independent predictor for incident diabetic CKD in all ethnic groups. Additionally, older age, hypertension, higher systolic BP, HbA1c and diabetes duration predicted incident diabetic CKD among Malays with diabetes, while higher systolic BP and HbA1c predicted incident diabetic CKD in Indians with diabetes.

## Discussion

In this prospective study of 5580 multiethnic Asians in the general population with a median follow-up of 6.1 years, incident CKD was more severe and more frequent in Malays compared with Chinese and Indians. Older age, diabetes, higher systolic BP and lower eGFR were independently associated with incident CKD in all three ethnic groups, while hypertension and cardiovascular disease were independently associated with incident CKD only in Malays. The estimated PAR of diabetes for incident CKD was 45.2% among Indians; while the PAR of hypertension was 54.7% among Malays. Adjustment for clinical, metabolic, socioeconomic and behavioural factors reduced the excess risk in Malays by 64% compared with Chinese but only 19% compared with Indians.

The annualized incident CKD rate was highest among Malays (1.3%) but the rates for all groups were similar to annualized rates of 0.9%–2% reported by general population studies in Taiwan, Korea and Japan.10–13 While population-based data from other ethnicities such as Malays and Indians are sparse, a retrospective cohort study of 460 individuals (25.4% Malays, 50.0% Chinese and 23.5% Indians) with hypertension from a Malaysian university medical centre’s primary care clinic reported that the incidence of CKD was 30.9% over 10 years.32 Our findings of estimated crude annual incidence and age-standardized annual incidence for each ethnicity are useful in informing the burden of incident CKD in the general population, since the estimated incidence rate of 1%–1.3% per year would translate to 44 000 incident CKD among the residential adult population of 3.2 million.33 Ethnic disparities in CKD prevalence were observed in an earlier, separate cohort of multiethnic general population study.15 While few Asian studies have compared incident CKD by ethnicity, disparities in kidney disease have been observed in North America where incident CKD varied by Hispanic/Latino heritage,34 and African-Americans suffer disproportionately from kidney disease.16 Measures of socioeconomic status attenuated the relation between African-American ethnicity and CKD but did not eliminate them.16 Likewise, this study found that adjustment for clinical, metabolic, socioeconomic and behavioural factors only partially explained the excess risk for incident CKD in Malays when compared with Chinese or Indians. The remaining excess risk unexplained by the multivariable model may be related to residual confounding from variables not included in this study, including other social determinants of health and genomic differences.16 35 Prior studies have noted ethnic differences in health literacy and health information-seeking behaviours possibly related to language barriers or cultural norms,36 37 which in turn may translate to differences in risk factor awareness and control shown in our study and highlighted in others.38 39 Since interventions improved outcomes in those with low health literacy,40 targeted strategies such as patient education and health policy change will be required to reduce incident CKD.41

Diabetes causes microvascular disease that leads to glomerular hyperfiltration with subsequent glomerulosclerosis, tubulointerstitial inflammation and fibrosis and is an established risk factor for progressive kidney disease.23 35 Thus, it was unsurprising that incident CKD among diabetes was twofold or threefold that of the general cohort in all ethnic groups. Similarly, an analysis of 34 international cohorts from the CKD Prognosis Consortium reported incident CKD in 14.9% of over 4 million participants without diabetes during a mean follow-up of 4.2 years and 40% of 781 627 participants with diabetes during a mean follow-up of 3.9 years.28 In our study, older age, higher systolic BP and lower eGFR were also independently associated with incident CKD in all three ethnic groups. These results were consistently found in traditional LR and both machine learning models. Hypertension and cardiovascular disease were identified to be important variables for incident CKD in Malays in both LR and GBM, but not in RF. The machine learning models complement LR, which assesses the association in a unit-dependent manner but fails to address the difference in the variable range or category. Instead, both GBM and RF calculate the effect of a variable as its overall contribution to the model performance, which is unit-free and applicable to various variable ranges or categories. For example, our LR model showed eGFR, age, systolic BP, and diabetes to be significant for CKD prediction, but it was machine learning that identified eGFR as the most influential variable of the four. Hence machine learning provided a simple and intuitive measure for the comparison of CKD risk factors, which is not directly achievable in LR where the OR and the p value need to be considered simultaneously. While machine learning methods can capture non-linear relationships and interactions in the variables,30 42 their performance for disease risk modelling may not be superior to traditional LR,43 especially when the variables are few and the sample size is small.44 Using traditional LR, the CKD Prognosis Consortium similarly found that among participants with no diabetes, older age, lower eGFR, hypertension and cardiovascular disease were associated with increased risk of incident CKD.28 Other risk factors identified by the Consortium but not significantly associated with incident CKD in our study were female gender, ever-smoker and BMI.28 While obesity was associated with incident CKD in Chinese in the univariate analysis, the association was lost after adjusting for all other factors. In contrast, a systematic review of 39 cohorts that included 630 677 participants with a mean follow-up of 6.8 years found that incident CKD was increased in obesity (pooled relative risk 1.28, 95% CI 1.07 to 1.54).45 Lower eGFR was an independent predictor for incident diabetic CKD in all ethnic groups. Additionally, older age, hypertension, higher systolic BP, HbA1c and diabetes duration predicted incident diabetic CKD among Malays with diabetes, while higher systolic BP and HbA1c predicted incident diabetic CKD in Indians with diabetes. These were similar to findings by the CKD Prognosis Consortium.28 Other risk factors identified by the Consortium but not significantly associated with incident diabetic CKD in our study were female gender, cardiovascular disease and BMI.28

There are some limitations in this study. CKD was defined based on eGFR, similar to the majority of studies on incident CKD,45 while albuminuria was not included in the definition of baseline or incident CKD since urine albuminuria was available only in a third of the Malay participants at baseline (those with known diabetes and one in five with no diabetes) and not quantified during follow-up. Incident CKD was defined using a single laboratory measurement and may overestimate incident CKD without a repeat measurement 3 months apart. However, the aforementioned systematic review noted no difference in the comparison between studies with and without repeated measurements of serum creatinine.45 Antihypertensive medication type, dose and duration were not evaluated as factors for the outcome since information on the type of antihypertensive was incomplete while the dose and duration were not assessed. As this study necessarily included only participants who attended and had renal function tests at both baseline and follow-up visits, there may be loss of follow-up and survival bias. As this study included older individuals 40–80 years old, some individuals may have died or developed significant disability that led to non-attendance at the follow-up visit. Compared with individuals who did not return for the follow-up visit, participants who returned for the follow-up visit were younger, more likely to be female, Chinese, attained secondary school or above education, and less likely to be Malay or Indian, have diabetes, hypertension, cardiovascular disease, current smoking (online supplemental etable 7). They also had lower systolic and diastolic BP, glucose, glycated hemoglobin and higher eGFR. Thus both loss to follow-up and survival bias may lead to a lower observed incident CKD. However, the estimated annual incidence of 1.17% in our cohort was similar to the annualized rates reported by general population studies in other Asian countries.10–13 Additionally, the supplementary analysis using IPW to account for attrition identified the same risk factors with similar risk estimates as the main analysis. The analysis of excess risk may be biased by unmeasured confounders or residual confounding of measured variables,46 while confounder-mediator confounding is not accounted for. In addition, the PAR assumes a causal relationship between the risk factor and the outcome and the independence of the risk factors. Hence there may be concern about the validity of the formula to estimate PARs where confounding of the exposure-disease association exists.47 Since the incidence was <10% in Chinese and Indians and 10% in Malays, the adjusted OR was used to approximate adjusted relative risk (RR) in an alternative expression which remains valid result in the presence of confounders.47 These PARs were similar to the original estimates, while those obtained using results from the multivariable LR model48 49 were more conservative (online supplemental etable 8). Although the PAR is an epidemiologic measure to assess the public health impact of risk factor exposure in the population, the reality is that the risk factor is unlikely to be completely eradicated. Instead, other measures such as the generalized impact fraction can estimate the fractional reduction of cases that would result from reducing the risk factor prevalence.48 50

In conclusion, our prospective population-based cohort study in Singapore demonstrated significant ethnic disparities in incident CKD in Asians that were partially explained by clinical, socioeconomic and behavioural factors using traditional LR and machine learning techniques. These findings may have important implications in terms of informing policy development and resource allocation in a culturally competent healthcare system to target risk factors that will bring about the greatest reduction in incident CKD.

## Data availability statement

Data are available upon reasonable request. As the study involves human participants, the data cannot be made freely available in the manuscript, the supplemental files, or a public repository due to ethical restrictions. Nevertheless, the data are available from the Singapore Eye Research Institutional Ethics Committee for researchers who meet the criteria for access to confidential data. Interested researchers can send data access requests to the Singapore Eye Research Institute using the following email address: seri@seri.com.sg.

## Ethics statements

### Ethics approval

The study (SEED) was conducted according to the Declaration of Helsinki and was approved by the Singapore Eye Research Institute Review Board and the SingHealth Centralised Institutional Review Board (2018/2717, 2018/2921, 2018/2006, 2018/2594, 2018/2570, 2015/2279, 2012/487/A).

• ## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

• Contributors CS is the guarantor and accepts full responsibility for the work and/or the conduct of the study, had access to the data, and controlled the decision to publish. CCL and CS conceptualized the study and wrote the first draft. FH performed statistical analysis. All authors interpreted the results and approved the final manuscript.

• Funding This study was supported by the National Medical Research Council, NMRC/STaR/016/2013, NMRC/CIRG/1371/2013, NMRC/CIRG/1417/2015 and OFLCG/001/2017.

• Competing interests None declared.

• Provenance and peer review Not commissioned; externally peer reviewed.

• Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

## Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.