Article Text

Prevalence and predictive modeling of undiagnosed diabetes and impaired fasting glucose in Taiwan: a Taiwan Biobank study
  1. Ren-Hua Chung1,
  2. Shao-Yuan Chuang1,
  3. Ying-Erh Chen2,
  4. Guo-Hung Li1,
  5. Chang-Hsun Hsieh3,
  6. Hung-Yi Chiou1,4,
  7. Chao A Hsiung1
  1. 1Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan
  2. 2Department of Risk Management and Insurance, Tamkang University, Taipei, Taiwan
  3. 3Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
  4. 4School of Public Health, College of Public Health, Taipei Medical University, Taipei, Taiwan
  1. Correspondence to Dr Ren-Hua Chung; rchung{at}nhri.edu.tw

Abstract

Introduction We investigated the prevalence of undiagnosed diabetes and impaired fasting glucose (IFG) in individuals without known diabetes in Taiwan and developed a risk prediction model for identifying undiagnosed diabetes and IFG.

Research design and methods Using data from a large population-based Taiwan Biobank study linked with the National Health Insurance Research Database, we estimated the standardized prevalence of undiagnosed diabetes and IFG between 2012 and 2020. We used the forward continuation ratio model with the Lasso penalty, modeling undiagnosed diabetes, IFG, and healthy reference group (individuals without diabetes or IFG) as three ordinal outcomes, to identify the risk factors and construct the prediction model. Two models were created: Model 1 predicts undiagnosed diabetes, IFG_110 (ie, fasting glucose between 110 mg/dL and 125 mg/dL), and the healthy reference group, while Model 2 predicts undiagnosed diabetes, IFG_100 (ie, fasting glucose between 100 mg/dL and 125 mg/dL), and the healthy reference group.

Results The standardized prevalence of undiagnosed diabetes for 2012–2014, 2015–2016, 2017–2018, and 2019–2020 was 1.11%, 0.99%, 1.16%, and 0.99%, respectively. For these periods, the standardized prevalence of IFG_110 and IFG_100 was 4.49%, 3.73%, 4.30%, and 4.66% and 21.0%, 18.26%, 20.16%, and 21.08%, respectively. Significant risk prediction factors were age, body mass index, waist to hip ratio, education level, personal monthly income, betel nut chewing, self-reported hypertension, and family history of diabetes. The area under the curve (AUC) for predicting undiagnosed diabetes in Models 1 and 2 was 80.39% and 77.87%, respectively. The AUC for predicting undiagnosed diabetes or IFG in Models 1 and 2 was 78.25% and 74.39%, respectively.

Conclusions Our results showed the changes in the prevalence of undiagnosed diabetes and IFG. The identified risk factors and the prediction models could be helpful in identifying individuals with undiagnosed diabetes or individuals with a high risk of developing diabetes in Taiwan.

  • risk assessment
  • pre-diabetic state
  • early diagnosis

Data availability statement

Data are available on reasonable request. Data may be obtained from a third party and are not publicly available. The Taiwan Biobank data can be applied through the Taiwan Biobank.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • It was estimated that among adults aged 20–79 years with diabetes globally, 44.7% are undiagnosed, and maximally 70% of individuals with pre-diabetes will develop diabetes. However, early detection of diabetes or pre-diabetes can prevent the development of further complications or the development of the disease.

WHAT THIS STUDY ADDS

  • The prevalence of undiagnosed diabetes has not been estimated in a large study in Taiwan, and a risk prediction model designed specifically for undiagnosed diabetes and pre-diabetes has not been developed for the Taiwan population. We estimated the prevalence of undiagnosed diabetes and impaired fasting glucose (IFG) in individuals without known diabetes in Taiwan and constructed a risk prediction model for identifying undiagnosed diabetes and IFG.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Our results showed increased trends in the prevalence of undiagnosed diabetes and IFG in Taiwan. The identified risk factors and the prediction model could be helpful in identifying individuals with undiagnosed diabetes or individuals with a high risk of developing diabetes in Taiwan.

Introduction

Diabetes, characterized by elevated blood glucose levels, is often associated with complications such as kidney disease, eye damage, and heart and blood vessel diseases.1 Globally, nearly 44.7% of adults aged 20–79 years with diabetes are unaware of their condition (ie, undiagnosed diabetes).2 Further, approximately 35% of newly diagnosed patients with diabetes are discovered to already have developed complications, such as retinopathy, neuropathy and ischemic heart disease.3 Hence, early diagnosis and effective management of diabetes are important to prevent the development of further complications and to reduce the clinical and economic burden.4

Pre-diabetes, characterized by blood glucose levels higher than normal but lower than the thresholds for diabetes,5 is variably defined by different professional organizations, such as the WHO, the American Diabetes Association (ADA), and the International Expert Committee.6 The WHO, for instance, defined pre-diabetes as fasting plasma glucose levels between 110 and 125 mg/dL (ie, impaired fasting glucose (IFG)) or 2-hour plasma glucose using the 75 g oral glucose tolerance test (OGTT) between 140 and 199 mg/dL (ie, impaired glucose tolerance (IGT)). Based on the WHO standard, the International Diabetes Federation (IDF) Diabetes Atlas (10th edition) reported that in 2021, the global prevalence of IFG was between 2.5% and 10% and the global prevalence of IGT was between 5.4% and 12.9%.7 It has been estimated that up to 70% of individuals with pre-diabetes will develop diabetes.5 However, lifestyle intervention can effectively delay or prevent disease progression,8–11 reinforcing the importance of early detection in the pre-diabetes stage.

In 2014, among the newly diagnosed cases of both type 1 and type 2 diabetes in Taiwan, type 2 diabetes constituted a significant majority (99.8%).12 Furthermore, within the same year, the prevalence of type 2 diabetes exhibited a notable increase, escalating to 9.32% from 6.38% in 2008.12 A separate study, which analyzed 1096 patients admitted to a specific hospital in Taiwan, found that between 24.5% and 50% of these patients, depending on the type of medical service received, were undiagnosed diabetes.13 The prevalence of undiagnosed diabetes based on large population-based studies in Taiwan has not been reported in the literature. The prevalence of IFG (defined as fasting glucose levels between 100 and 125 mg/dL) in Taiwan was estimated at 16.2%–35.5% for different age groups in the Nutrition and Health Survey of 2013–2016.14

Several prediction models have been constructed for detecting undiagnosed diabetes and pre-diabetes, such as the Danish Risk Score,15 the Leicester Risk Score,16 the Finnish Diabetes Risk Score (FINDRISC),17 the Indian Diabetes Risk Score,18 19 and the Taiwan Diabetes Risk Score (DRS).20 These models, based on simple non-invasive questionnaires, achieved area under the curves (AUCs) between 67% and 80% for detecting diabetes in different populations. Common predictors used in these models included age, sex, body mass index (BMI), known hypertension, and family history of diabetes. Eleven existing diabetes risk scores, including the Danish Risk Score, FINDRISC, and Taiwan DRS, were evaluated in a Taiwanese cohort, and their AUCs for detecting diabetes ranged between 67% and 77%,20 21 suggesting that commonly used risk predictors for diabetes are also applicable to the Taiwan population.

Studies and national surveys have shown that the prevalence of undiagnosed diabetes in Asians is generally higher than that in other populations.7 22 For example, in the USA between 2017 and 2020, the crude prevalence of undiagnosed diabetes among non-Hispanic Asians was 5.4%, the highest among all ethnicities. This may be due to the lower average BMI in Asians, which leads to less frequent screening for diabetes among Asians.22 Hence, a simple and fast diabetes screening tool with high accuracy would be particularly helpful for the Asian population.

In this study, we investigated the prevalence of undiagnosed diabetes and IFG and the trends in the prevalence over the years in the population without known diabetes in Taiwan, which is the population targeted for diabetes screening. Using a machine learning algorithm, we selected significant prediction variables and constructed risk prediction models for predicting undiagnosed diabetes, IFG, and healthy reference group (ie, individuals without diabetes or IFG) as three ordinal outcomes. This is contrary to the aforementioned models that considered binary outcomes (eg, undiagnosed diabetes vs non-diabetes).

Research design and methods

Study participants

The Taiwan Biobank is a population-based study, which has recruited more than 150 000 individuals between the ages of 30 and 70 in Taiwan since 2008.23 Each participant provided information through questionnaires such as self-reported disease status and family history of diseases, and underwent physical examinations and blood and urine tests including fasting glucose and glycated hemoglobin (HbA1c) at recruitment. Approximately 115 000 individuals were genotyped with genome-wide association study (GWAS) single-nucleotide polymorphism (SNP) arrays, which allowed us to perform standard GWAS quality control (QC) procedures, such as removing closely related individuals and those with discordance between self-reported gender and biological sex estimated from the genetic data. Detailed GWAS QC procedures are provided in the online supplemental materials. Individuals who fasted for less than 8 hours were excluded from the analysis.

Supplemental material

Thereafter, the individuals were linked to their medical records in the National Health Insurance Research Database (NHIRD) between 2009 and 2020. The NHIRD contains enrollees’ demographic data, medical records, and expenditure claims from outpatient, inpatient, and ambulatory care, and data associated with contracted pharmacies for reimbursement purposes.24 The diagnosis codes (International Classification of Diseases, Ninth Revision (ICD-9) and International Classification of Diseases, Tenth Revision (ICD-10) codes) and prescription drug codes of the Taiwan Biobank samples’ outpatient, ambulatory care, and inpatient services from 3 years before their recruitment were extracted. Based on the diagnosis and drug codes, we further excluded individuals who were using hypoglycemic drugs or who had been diagnosed with hyperthyroidism, Cushing’s syndrome, or acromegaly—a common cause of hyperglycemia in Taiwan—before their recruitment.

Outcome definitions

Diabetes was classified into two categories for this study: known and undiagnosed. ‘Known diabetes’ referred to patients who had previously self-reported having type 1, type 2, or gestational diabetes in the questionnaires at recruitment. Additionally, this category also included those with a medical history of diabetes before they were recruited to the Taiwan Biobank. This was determined based on whether the patient had at least three clinical visits or at least one hospitalization within the past year, where diabetes was diagnosed according to ICD-9 codes (beginning with 250) or ICD-10 codes (beginning with E08-E13) from the NHIRD records. Individuals with known diabetes were excluded from our analysis.

On the other hand, “undiagnosed diabetes” referred to individuals who did not fall under the “known diabetes” category as defined above but showed elevated levels of both fasting glucose (≥126 mg/dL) and HbA1c (≥6.5%) based on blood test results taken at the time of their recruitment. This threshold for defining undiagnosed diabetes is consistent with the criteria suggested by Selvin et al25 when fasting glucose and HbA1c measurements are available for each study participant, which is the case with the Taiwan Biobank. The threshold has also been used in estimating the prevalence of undiagnosed diabetes in the USA.26

IFG was defined as those who were not diabetic but had fasting glucose between 110 and 125 mg/dL, according to the WHO standard,27 referred to as IFG_110. We also defined IFG based on the ADA standard (ie, fasting glucose between 100 and 125 mg/dL), referred to as IFG_100. Lastly, a healthy reference group referred to individuals who were not diabetic or had IFG.

Prevalence of undiagnosed diabetes and IFG

The crude prevalence of undiagnosed diabetes was calculated as the proportion of individuals with undiagnosed diabetes among those with undiagnosed diabetes, IFG, or healthy reference group. Similarly, the crude prevalence of IFG was calculated as the proportion of individuals with IFG among those with undiagnosed diabetes, IFG, or healthy reference group. Age-specific and sex-specific crude prevalence was calculated. A standardized prevalence was thereafter calculated using the direct method, based on the 2020 census data of the general Taiwan population as the standard. The census data were obtained from the Department of Household Registration of the Ministry of Interior in Taiwan. We partitioned the samples into four groups, each with a minimum of 10 000 participants, covering a 2- to 3-year span: 2012–2014 for the first group, 2015–2016 for the second, 2017–2018 for the third and 2019–2020 for the fourth group. We calculated each group’s prevalence, enabling us to investigate changes in the prevalence of undiagnosed diabetes and IFG over time.

Risk prediction models for undiagnosed diabetes and IFG

We compiled a list of 150 variables from the Taiwan Biobank survey data including basic information such as age, sex, and BMI; health behaviors such as drinking, smoking, and physical activity; female-specific variables such as age of menarche and number of pregnancies; and self-reported diseases (including first-degree relatives) such as hypertension, hyperlipidemia, and glaucoma, which can all be completed by self-assessment of an individual at home. Variables with missing rates >10% were excluded. The mice package in R,28 which implements the multivariate imputation by chained equations, was used to impute the missing values in the remaining variables.

We randomly divided 80% of the samples into a training dataset and the remaining 20% as a testing dataset. The outcomes were considered as ordinal (healthy reference group, IFG, and undiagnosed diabetes). Two models were constructed: Model 1 considered the healthy reference group, IFG_110, and undiagnosed diabetes; Model 2 considered the healthy reference group, IFG_100, and undiagnosed diabetes. The forward continuation ratio model with the Lasso penalty implemented in the R package glmnetcr29 was applied to the training dataset to select significant risk predictors. The hyperparameter in the Lasso penalty (the λ value) was selected based on the best-fitted model using the Bayesian Information Criterion. These procedures for model training were performed on the training dataset. The significant risk predictors were used to create the final prediction model. The probability of undiagnosed diabetes (p1) and the probability of undiagnosed diabetes or IFG (p2) were calculated from the final model using the R package VGAM.30 The probability p1 was used to predict undiagnosed diabetes versus non-diabetes (including IFG and healthy reference group), and p2 was used to predict undiagnosed diabetes or IFG versus healthy reference group using the testing dataset.

AUC was calculated to evaluate the performance of the models. Optimal cut-off values to determine the sensitivities and specificities were selected using the Youden index based on the receiver operating characteristics (ROC) curves. Models 1 and 2 were also applied to predict undiagnosed diabetes or IFG+/HbA1c+ (fasting glucose between 110 and 125 mg/dL and HbA1c between 6.0% and 6.4% as defined by Washirasaksiri et al31) versus healthy reference group, allowing us to evaluate the performance of the trained models for different definitions of pre-diabetes. It is of interest to note that the IFG+/HbA1c+ subgroup has been shown to have a high risk of 5-year diabetes incidence, making it important to evaluate how our model performed for identifying this subgroup.

External validation analysis

We used the CardioVascular Disease risk FACtors Two-township Study (CVDFACTS), which is a community-based cohort study, to validate the prediction results. CVDFACTS investigates risk factors for cardiovascular diseases in Taiwan.32 Approximately 6000 individuals were recruited between 1991 and 1993 in two towns, Chu-Dung and Pu-Tzu. Individuals who had a history of stroke, had fasted for less than 8 hours, or were not covered by the National Health Insurance were excluded from the baseline in the study. Several follow-up surveys and examinations were conducted, and this study used the fifth follow-up data, which was collected between 1999 and 2002 for the analysis. Data from individuals aged between 30 and 70 years old were extracted, which resulted in a total of 1481 samples for the analysis. The significant risk prediction variables selected from the Taiwan Biobank samples were extracted from the CVDFACTS survey and examination data. Known diabetes in CVDFACTS was defined as individuals who self-reported as having diabetes. HbA1c was not measured in CVDFACTS, hence, undiagnosed diabetes was defined as those who were not known having diabetes but had fasting glucose ≥126 mg/dL. The same definitions used for the Taiwan Biobank samples were applied to define IFG and healthy reference group. The risk prediction models constructed using the Taiwan Biobank training dataset were applied to the CVDFACTS samples. Sensitivities and specificities were calculated using the optimal cut-off values selected for the testing dataset from the Taiwan Biobank.

Results

Figure 1 shows our analysis flowchart. After the sample QC, 64 875 individuals remained for the analyses. Table 1 shows the characteristics of the four sample groups stratified by years. There were 12,572, 22,295, 15,990, and 14 018 individuals without known diabetes for the groups of 2012–2014, 2015–2016, 2017–2018, and 2019–2020, respectively. All groups comprised a majority of females (approximately 63%–66%). The mean BMI, fasting glucose level, and HbA1c and the proportion of self-reported hypertension were all higher in males compared with females.

Figure 1

Flowchart of our analysis steps. GWAS, genome-wide association study; NHIRD, National Health Insurance Research Database.

Table 1

Characteristics of participant groups divided by the year of recruitment (2- to 3-year intervals)

Online supplemental figure S1 in the online supplemental materials shows the age-specific and sex-specific crude prevalence of undiagnosed diabetes in the four groups. Generally, males had higher undiagnosed rates than females within groups. The rates generally increased with age, except for males aged between 50 and 59 in the years 2012–2014 and 2019–2020, who showed the highest rates compared with other age groups. Online supplemental figures S2 and S3 in the online supplemental materials show the age-specific and sex-specific crude prevalence of IFG_110 and IFG_100 in the four groups. Males also had higher rates of IFG than females within groups, which also increased with age.

Figure 2 shows the standardized prevalence of undiagnosed diabetes and IFG. The standardized prevalence of undiagnosed diabetes was 1.11%, 0.99%, 1.16%, and 0.99% for 2012–2014, 2015–2016, 2017–2018, and 2019–2020, respectively. Moreover, the standardized prevalence of IFG_110 for the four groups was 4.49%, 3.73%, 4.30%, and 4.66%, while the standardized prevalence of IFG_100 was 21.0%, 18.26%, 20.16%, and 21.08%. No obvious increasing or decreasing trend over the years was observed for either the prevalence of undiagnosed diabetes or IFG.

Figure 2

The standardized prevalence and its 95% CIs for undiagnosed diabetes, IFG_110 (fasting glucose between 110 and 125 mg/dL), and IFG_100 (fasting glucose between 100 and 125 mg/dL) over the years. IFG, impaired fasting glucose.

For identifying risk factors, there were 140 variables remained after QC. Table 2 shows the significant variables selected from the Lasso regression based on 140 variables and the estimates of the effects of the significant variables based on the forward continuation ratio model for Models 1 and 2. Common variables included in other prediction models such as age, BMI, waist to hip ratio (WHR), self-reported hypertension, and family history of diabetes were selected. Model 1 also included education levels and betel nut chewing, which may be specific to the Western Pacific or Taiwan population. On the other hand, Model 2 included alcohol consumption and the personal monthly income that was not included in Model 1. These additional variables provide further insight into the risk factors associated with undiagnosed diabetes and IFG.

Table 2

Significant prediction variables selected by the Lasso regression for Models 1 and 2

Table 3 shows the AUCs for predicting undiagnosed diabetes versus non-diabetes (including IFG and healthy reference group) and for predicting undiagnosed diabetes or IFG versus healthy reference group in the overall, male, and female samples based on the testing dataset using Models 1 and 2. Generally, predicting undiagnosed diabetes alone yielded higher AUCs than predicting undiagnosed diabetes or IFG, and prediction in females also had higher AUCs than prediction in males. Furthermore, Model 1, which defined IFG using a more stringent threshold, generally demonstrated higher AUCs compared with Model 2. We further applied Models 1 and 2 to predict undiagnosed diabetes or IFG+/HbA1c+ versus healthy reference group, and the results are also shown in table 3. The AUCs were all higher than those for predicting undiagnosed diabetes or IFG versus healthy reference group, once again highlighting that AUCs were higher when the IFG definition was more stringent.

Table 3

Area under the curves with their 95% CIs for predicting undiagnosed diabetes and IFG

The optimal cut-off thresholds selected based on the Youden index from the ROC curves generated in the overall sample in the testing dataset were used to calculate the sensitivities and specificities. The cut-off threshold of p1 for predicting undiagnosed diabetes was 0.0065, and the threshold of p2 for predicting undiagnosed diabetes or IFG was 0.0367 in Model 1. In Model 2, the thresholds for p1 and p2 were 0.0053 and 0.1773, respectively. Figure 3 shows the results for the testing dataset from the Taiwan Biobank and the validation dataset from the CVDFACTS in Model 1. The overall sensitivity and specificity were 75.6% and 72.4% for the testing dataset, and 67.6% and 61.9% for the validation dataset, respectively, for predicting undiagnosed diabetes in Model 1. For predicting undiagnosed diabetes or IFG, the overall sensitivity was higher (81.7%), but specificity was lower (61.5%) than those for predicting undiagnosed diabetes. The same trend was observed for the validation dataset. In Model 2, the sensitivity was higher for predicting undiagnosed diabetes compared with predicting undiagnosed diabetes or IFG, while the specificities were similar, as shown in online supplemental figure S4 in the online supplemental materials. The estimates from the CVDFACTS were generally lower than the estimates from the Taiwan Biobank testing dataset. This is not surprising since the cut-off thresholds were optimized based on the Taiwan Biobank testing dataset. We also calculated the sensitivity and specificity using only male or female samples. As shown in figure 3 and online supplemental figure S4, the sensitivities were higher in males than in females, while the specificities were higher in females than in males.

Figure 3

Sensitivities and specificities for Model 1. The upper section of the figure illustrates the sensitivities (left) and specificities (right) of the model in predicting undiagnosed diabetes versus non-diabetes. The lower section of the figure shows the sensitivities (left) and specificities (right) for predicting undiagnosed diabetes or IFG_110 (fasting glucose between 110 and 125 mg/dL) versus healthy reference group. The sensitivities and specificities were calculated in the overall, male, and female samples in the Taiwan Biobank testing dataset (TWB) and the external CardioVascular Disease risk FACtors Two-township Study (CVDFACTS) validation dataset. The 95% CIs are depicted as error bars in the figure. The results for TWB and CVDFACTS are represented in orange and green bars, respectively.

Discussion

To our knowledge, the prevalence of undiagnosed diabetes and IFG in Taiwan using the WHO’s definition has not been reported in the literature. For example, in the IDF Diabetes Atlas (10th edition) report, the undiagnosed rates of diabetes and prevalence of IFG in Taiwan were extrapolated from data in nearby countries with similar ethnicity, language, and World Bank income classification. Our results filled this gap as it provides useful information that will improve the estimates of both prevalence in the Western Pacific region and globally.

Our results revealed a minor variation in the prevalence of undiagnosed diabetes in Taiwan, ranging from 0.99% to 1.16% during 2012–2020, without apparent increasing or decreasing trend. The estimates were close to the recent estimates in the USA (from 1.10% to 1.23%) using the same definition of undiagnosed diabetes.26 However, the prevalence estimates from our study were lower than those calculated for the population without known diabetes in Japan (2.9%–5.6%).33 34 This is expected, as their definition for undiagnosed diabetes was broader, requiring either fasting glucose ≥126 mg/dL or HbA1c ≥6.5%, while our definition required both criteria to be met. Furthermore, the prevalence of IFG based on the WHO definition in our study was estimated between 3.73% and 4.66% from 2012 to 2020. This estimate aligns closely with the extrapolated prevalence of 4.5% in Taiwan in 2021, as reported in the IDF Diabetes Atlas (10th edition). On the other hand, the prevalence of IFG based on the ADA definition in Taiwan was estimated to be between 18.26% and 21.08% from 2012 to 2020, which is higher than 16% found in the Thai population.35

Our variable selection procedure identified both BMI and WHR as significant predictors. This is consistent with the finding based on an Indian population that a composite measure of BMI and waist circumference resulted in a better predictor for type 2 diabetes than either BMI or waist circumference alone.19 More interestingly, our variable selection procedure identified education level and betel nut chewing in Model 1 and personal monthly income in Model 2 as risk factors not commonly included in predicting undiagnosed diabetes in literatures. For example, in the review paper of risk prediction models for incident or undiagnosed diabetes by Collins et al,36 education level was included as a risk predictor for incident diabetes in only one model that is also based on a Taiwan population,37 and none of the models reviewed in the paper included betel nut chewing. A previous study in Taiwan reported that education level was negatively associated with 5-year diabetes incidence,37 while another showed that patients with diabetes with higher education levels had better knowledge of diabetes.38 As discussed in Hill-Briggs et al,39 while education level and personal income are correlated, they have distinct implications for health outcomes. In addition, a higher prevalence of diabetes has been found in populations with lower income in studies from the USA and Canada.39 40 A study from Taiwan also suggests that poverty is associated with not only diabetes incidence but also inequality of diabetes care.41 Hence, as suggested by Sun et al,37 considering social deprivation in diabetes prevention is important in reducing health inequalities. Furthermore, a study showed that the prevalence of betel nut chewing is high in Taiwan, Mainland China, Malaysia, Indonesia, Nepal, and Sri Lanka.42 In Taiwan, this prevalence was approximately 7% in 2018.43 Betel nut chewing has been found to be associated with current and incident diabetes.44 45 Our result provides information for identifying risk factors and developing predicting models for undiagnosed diabetes and IFG in countries with a high prevalence of betel nut chewing.

The performance of our model in predicting undiagnosed diabetes (with an overall AUC of 80.39% for Model 1 and 77.87% for Model 2) is comparable to the models in the literature using simple questionnaires. For example, AUCs of 72%–80% were reported for predicting undiagnosed diabetes.15–17 33 46 47 Our model also has a higher AUC than the AUC of the Taiwan DRS (reported as 76% by Li et al20). Moreover, our model resulted in the highest AUC for predicting undiagnosed diabetes or IFG (overall AUC of 78.25% for Model 1 and 74.39% for Model 2) compared with previous studies which reported AUCs of 67%–72%.16 48–50 This could be because we specifically considered the healthy reference group, IFG, and undiagnosed diabetes as three ordinal outcomes in the same model. Interestingly, when our models were applied to identify the high-risk pre-diabetes subgroup (IFG+/HbA1c+) or undiagnosed diabetes, the AUCs further increased to 79.35% and 78.32% for Models 1 and 2, respectively.

There were several strengths in our study. Large-scale biobanks usually contain a high proportion of related samples. The samples included in this study all had SNP array data, which allowed us to perform stringent sample QC and identify unrelated individuals. Furthermore, linking the Taiwan Biobank with NHIRD allowed us to use the survey and blood test results from the Taiwan Biobank and medical records from NHIRD to robustly define known and undiagnosed diabetes. Finally, the large sample size from the Taiwan Biobank allowed us to estimate the prevalence and train and fine-tune the prediction model. A limitation of our study is that the 2-hour plasma glucose based on OGTT was not measured in the Taiwan Biobank. Hence, the prevalence of IGT and pre-diabetes, defined using both IFG and IGT, could not be estimated. Furthermore, in our study, undiagnosed diabetes was identified based on a single measurement of fasting glucose and HbA1c at recruitment, in contrast to the clinical practice of using repeated measurements for diabetes diagnosis. However, Selvin et al25 have shown that using both fasting glucose and HbA1c measurements in one sample can yield a high positive predictive value for subsequent diagnosis, which effectively reduces the potential drawback of relying on a one-time measurement.

In conclusion, our study documented the current trends in the prevalence of undiagnosed diabetes and IFG in Taiwan. We also identified risk factors that are important for predicting undiagnosed diabetes and IFG. The prediction model will be useful in identifying individuals with undiagnosed diabetes or individuals with a high risk of developing diabetes in Taiwan.

Data availability statement

Data are available on reasonable request. Data may be obtained from a third party and are not publicly available. The Taiwan Biobank data can be applied through the Taiwan Biobank.

Ethics approval

This study involves human participants and was approved by the institutional review board of National Health Research Institutes (reference no: EC1091202-E). Participants gave informed consent to participate in the study before taking part.

Acknowledgments

We thank the participants from the Taiwan Biobank and CVDFACTS studies.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors R-HC, Y-EC, C-HH, H-YC, and CAH designed the study. R-HC and G-HL performed the analyses. SYC performed the validation analysis. R-HC is the guarantor. All authors helped interpret the analysis results and approved the final manuscript.

  • Funding This study was supported by grants PH-111-GP-04 and PH-111-PP-10 from the National Health Research Institutes, and MOST 110-2314-B-400-023 from the National Science and Technology Council in Taiwan.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.