Article Text

Artificial intelligence-enabled screening for diabetic retinopathy: a real-world, multicenter and prospective study
  1. Yifei Zhang1,
  2. Juan Shi1,
  3. Ying Peng1,
  4. Zhiyun Zhao1,
  5. Qidong Zheng2,
  6. Zilong Wang3,
  7. Kun Liu4,
  8. Shengyin Jiao3,
  9. Kexin Qiu3,
  10. Ziheng Zhou3,5,
  11. Li Yan6,
  12. Dong Zhao7,
  13. Hongwei Jiang8,
  14. Yuancheng Dai9,
  15. Benli Su10,
  16. Pei Gu11,
  17. Heng Su12,
  18. Qin Wan13,
  19. Yongde Peng14,
  20. Jianjun Liu15,
  21. Ling Hu16,
  22. Tingyu Ke17,
  23. Lei Chen18,
  24. Fengmei Xu19,
  25. Qijuan Dong20,
  26. Demetri Terzopoulos21,22,
  27. Guang Ning1,
  28. Xun Xu4,
  29. Xiaowei Ding3,5,
  30. Weiqing Wang1
  1. 1Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
  2. 2Department of Internal Medicine, The Second People’s Hospital of Yuhuan, Yuhuan, China
  3. 3Department of Research, VoxelCloud, Shanghai, China
  4. 4Department of Ophthalmology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
  5. 5Department of Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
  6. 6Department of Ophthalmology, The Third People’s Hospital of Datong, Datong, China
  7. 7Center for Endocrine Metabolism and Immune Diseases, Beijing Luhe Hospital, Capital Medical University, Beijing, China
  8. 8Department of Endocrinology and Metabolism, The First Affiliated Hospital, and College of Clinical Medicine of Henan University of Science and Technology; Luoyang City Clinical Research Center for Endocrinology and Metabolism, Luoyang, China
  9. 9Department of Internal Medicine of Traditional Chinese Medicine, Sheyang Diabetes Hospital, Yancheng, China
  10. 10Department of Endocrinology, The Second Affiliated Hospital Dalian Medical University, Dalian, China
  11. 11Department of Endocrinology, Datong Coal Group Ltd. General Hospital, Datong, China
  12. 12Department of Endocrine and Metabolic Diseases, The First People’s Hospital of Yunnan Province, Kunming, China
  13. 13Department of Endocrinology and Metabolism, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, China
  14. 14Department of Endocrinology and Metabolism, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
  15. 15Department of Endocrinology, Longkou People’s Hospital, Yantai, China
  16. 16Department of Endocrinology, The Third Affiliated Hospital of Nanchang University, Nanchang, China
  17. 17Department of Endocrinology, The Second Affiliated Hospital of Kunming Medical University, Kunming, China
  18. 18Department of Endocrinology and Metabolism, The Affiliated Suzhou Hospital of Nanjing Medical University, Suzhou, Jiangsu, China
  19. 19Department of Endocrinology and Metabolism, Hebi Coal (group) Ltd. General Hospital, Hebi, China
  20. 20Department of Endocrinology and Metabolism, People’s Hospital of Zhengzhou, Zhengzhou, China
  21. 21Department of Computer Science, Computer Graphics & Vision Laboratory, University of California Los Angeles, Los Angeles, California, USA
  22. 22Department of Research, VoxelCloud, Los Angeles, California, USA
  1. Correspondence to Dr Weiqing Wang; wqingw61{at}163.com; Mr Xiaowei Ding; dingxiaowei{at}sjtu.edu.cn

Abstract

Introduction Early screening for diabetic retinopathy (DR) with an efficient and scalable method is highly needed to reduce blindness, due to the growing epidemic of diabetes. The aim of the study was to validate an artificial intelligence-enabled DR screening and to investigate the prevalence of DR in adult patients with diabetes in China.

Research design and methods The study was prospectively conducted at 155 diabetes centers in China. A non-mydriatic, macula-centered fundus photograph per eye was collected and graded through a deep learning (DL)-based, five-stage DR classification. Images from a randomly selected one-third of participants were used for the DL algorithm validation.

Results In total, 47 269 patients (mean (SD) age, 54.29 (11.60) years) were enrolled. 15 805 randomly selected participants were reviewed by a panel of specialists for DL algorithm validation. The DR grading algorithms had a 83.3% (95% CI: 81.9% to 84.6%) sensitivity and a 92.5% (95% CI: 92.1% to 92.9%) specificity to detect referable DR. The five-stage DR classification performance (concordance: 83.0%) is comparable to the interobserver variability of specialists (concordance: 84.3%). The estimated prevalence in patients with diabetes detected by DL algorithm for any DR, referable DR and vision-threatening DR were 28.8% (95% CI: 28.4% to 29.3%), 24.4% (95% CI: 24.0% to 24.8%) and 10.8% (95% CI: 10.5% to 11.1%), respectively. The prevalence was higher in female, elderly, longer diabetes duration and higher glycated hemoglobin groups.

Conclusion This study performed, a nationwide, multicenter, DL-based DR screening and the results indicated the importance and feasibility of DR screening in clinical practice with this system deployed at diabetes centers.

Trial registration number NCT04240652.

  • diabetic retinopathy
  • diagnostic techniques and procedures
  • epidemiology
  • clinical study
http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Significance of this study

What is already known about this subject?

  • Previous studies have indicated a high prevalence of diabetes in China; however, the prevalence of diabetes retinopathy (DR) varied and nationwide program for DR screening is lacking.

  • A potential value of automated deep learning (DL) algorithm in DR screening was indicated; however, its feasibility in clinical application in population with great heterogeneity needs further investigation.

What are the new findings?

  • We currently validated an artificial intelligence (AI)-enabled DR screening in real-world practice at 155 diabetes centers with comparable performance to human specialists.

  • Our study is a large-scale nationwide DR screening program using data from representative cohorts and offered evidence of DR prevalence in patients with diabetes in China.

  • It provided evidence of efficiency and accuracy in DL-based DR screening in clinical practice through a comprehensive survey.

How might these results change the focus of research or clinical practice?

  • DL-based DR screening at diabetes centers is feasible, and with a high prevalence of DR detected, it may provide an optional solution to this public health problem in the future.

Introduction

According to recent estimates, there were 451 million people with diabetes, aged 18–99 years worldwide in 2017, and the number will increase to 693 million by 2045.1 The diabetes epidemic is worse in China.2 ,3 Per the 2013 national survey, 10.9% of Chinese adults were estimated to suffer from diabetes, and among them, only 36.5% were aware of this diagnosis and 32.2% were treated.3 The higher prevalence and lower treatment rate of diabetes in China will lead to a higher incidence of diabetes related complications nationwide.4 5

Diabetic retinopathy (DR) is one of the common chronic complications of diabetes, which is the leading cause of blindness, although preventable in the working age group.6–8 Early screening and timely referral can delay its progress and effectively prevent vision loss.9 However, relative to the high prevalence of diabetes in China, the ability to screen for DR is inadequate and a nationwide program for DR screening is scarce. The reasons are multifaceted, including the shortage of eye care specialists, the lack of efficient screening methods and the multidisciplinary process from image acquisition to the diagnosis of DR. In real-world clinical settings, a large portion of patients with diabetes receive their first DR diagnosis during their independent ophthalmologist visits in the symptomatic stage of DR, instead of an earlier diagnosis at diabetes centers or referral visits to ophthalmologists in the non-symptomatic stage.10–12 In addition, strategies for managing DR in China are difficult to reproduce due to regional economic barriers and living habit differences. Therefore, it is essential to establish a standardized system for early DR detection and management that is feasible for the whole country.

Deep learning (DL), a form of artificial intelligence (AI), has emerged and shown convincing performance in several areas, including medical science.13–15 A recent study by Ting et al16 has revealed a potential value of automated DL system in DR grading using images from multiethnic cohorts of patients with diabetes, together with several other studies has shown a high sensitivity and specificity in identifying DR (especially referable DR), indicating that the proper use of DL technology in clinical settings may help deliver data-driven analytics for better patient outcome.16–23

However, the evidence to confirm the clinical value of DL for DR screening in large-scale healthcare settings is insufficient and most studies have been performed on high-quality image datasets that could hardly represent the variety of image quality and other operational limitations of real-world DR screening applied at diabetes centers.17 18 There are few reports regarding the practical application of AI in clinic-based DR screening, with patient cohorts of 3049 and 1415, respectively.20 24 Its feasibility and quality in real-world use must be further explored using datasets with larger sample sizes and demographic variations.

Therefore, in the present study, we conducted a prospective, nationwide DR screening, using a DL algorithm, with a cohort of 47 269 patients at 155 diabetes centers in China. The operational feasibility and accuracy of the DL algorithm was validated and the prevalence of DR, referable DR (moderate non-proliferative DR (NPDR) or worse), and vision-threatening DR (VTDR, severe NPDR or worse, and/or clinically significant macular edema (CSME)) was reported.

Methods

Population

The National Metabolic Management Center (MMC) is a pilot diabetes care system in China, founded in 2016. It aims at establishing a nationwide, standard and reproducible platform based on advanced medical equipment and Internet of Things technology for the diagnosis and management of diabetes and its complications.25 The Diabetic Retinopathy Screening and Prevention Program is an MMC branch project. Its purpose is to develop an efficient workflow for the early detection, timely follow-up and management of DR, and to establish a referral system for future treatment and long-term follow-up.

Between June 2018 and August 2019, a total of 47 269 consecutive patients with diabetes aged 18 years or older from 155 MMCs in China were enrolled in the present study. The involved MMCs were in the hospitals with different levels according to tiered medical service system throughout 26 provinces in China. All the participants were screened for DR by the DL-based system, which labeled the fundus images as DR stage or ungradable due to image quality issues. Fundus images obtained from one-third of randomly selected participants were reviewed offline by a two-stage reading performed by a panel of specialists for the purposes of DL algorithm validation on both DR grading and image quality assessment (figure 1).

Figure 1

Fundus image grading work flow and adjudication. DL, deep learning; DR, diabetic retinopathy.

All the participants underwent a full medical examination at the local MMCs.

Baseline data collection

The eligible participants were those with a diagnosis of diabetes according to the WHO criteria.26 Detailed inclusion and exclusion criteria are summarized in the online supplemental methods. At baseline, all data (including a standardized questionnaire and comprehensive clinical and laboratory examinations) were collected from each participant through an MMC specialized electronic medical record system.25

Supplemental material

Data collection was conducted by trained staff according to a standard protocol. Social demographic characteristics, medical history and lifestyle factors were recorded. Height and body weight were measured by a height-weight scale with participants in light clothes without shoes, and body mass index (BMI) was calculated as the weight in kilograms divided by height in meters squared. Blood pressure and heart rate were measured with electronic blood pressure monitors after at least a 5 min rest in the seated position. Waist circumference was measured on standing participants midway between the lower edge of the costal arch and the upper edge of the iliac crest. The participants were required to undergo a standard steamed bread meal test after an overnight fasting, and blood samples were collected at 0 and 2 hours during the test. Detailed data collection procedures are listed in the online supplemental methods.

Fundus photography acquisition

One standard, non-mydriatic, 45° field of view, macula-centered color and non-stereoscopic retinal fundus image was acquired from each eye of each participant. Various models of fundus cameras were used. Topcon TRC-NW400, MiiS DSC-200, Canon CR-2 PLUS AF, Canon CR-2 AF and Zeiss VISUCAM200 cameras were used in >80% of all the centers (online supplemental table 1). At all the centers, trained technicians took only non-mydriatic images, and no pupillary dilation images were additionally acquired. All the participants’ images were anonymized before grading.

Development of the DL algorithms

VoxelCloud Retina, an automated retinal disease screening system, was used to grade fundus images. The VoxelCloud Retina DR system was developed using DL techniques.

Two sets of data were used to train the different deep learning networks that form the final ensemble of DR and diabetic macular edema (DME) severity classification modules. The first dataset comprises 143 626 fundus photographs of 37 231 patients obtained from 2005 to 2015 from a large private retinal image database (online supplemental tables 2 and 3).

The second dataset comprises 1184 color fundus images from a public hospital in China, which were assigned a DR severity grade based on consensus from three ophthalmologists (online supplemental table 4). These data were chosen to help improve the model performance on confusing cases that could fall on the boundary between two grades.

The DR and DME models are an ensemble of six neural networks (online supplemental figure 1). All the six neural networks use the state-of-the-art Inception-ResNet v2 architecture;27 however, several design differences among them are critical for the effective performance of the model ensemble. Details are presented in the online supplemental methods.

In addition to the DR and DME models, the system also includes trained independent lesion models that detect the presence of lesions that contribute to DR grade, including fundus hemorrhage, hard exudates and laser scars. These independent lesion models are used to achieve improvements in the DR prediction performance, based on the DR classification rules listed in online supplemental table 5.

All color fundus images are normalized to pixel intensity values between 0 and 1 and are resized to a standard resolution of 800 by 800 pixels before being processed by the system.

The system was also tested on various private and public datasets. The testing results on the APTOS 2019 Blindness Detection dataset, a public dataset also collected in a real-world scenario close to that of the present study, is reported in the online supplemental methods (https://www.kaggle.com/c/aptos2019-blindness-detection/overview).

DL-based DR grading

The system specified for DR screening comprises the following three modules:

Quality control Module

The quality control (QC) module evaluates the quality of fundus images before the five-stage DR grading. The quality of fundus images is classified as gradable or ungradable. Those assessed as ungradable (low quality) are not sent for further DR grading. Gradable images are further sorted into excellent and adequate quality, while ungradable images are sorted into insufficient information and non-fundus images. The gradeability criteria in the model training phase were: 1) the image must cover at least 45° of the retinal area with the macula and the optic disc visible; 2) at least 80% of the retinal area must be recognizable and 3) no overexposure, underexposure or blur caused by focusing failure and motion.

DR severity classification module

The DR severity classification module provides each fundus image a five-stage DR severity classification that can be further transferred to multiple binary classifications to meet different demands. The severity classification mainly follows the International Clinical Diabetic Retinopathy (ICDR) severity scale,28 which is developed by the International Council of Ophthalmology and adopted by the American Academy of Ophthalmology.29 Slight modifications were made to adapt to the situation, considering that only a single non-mydriatic fundus image was acquired from each eye, covering the posterior pole, instead of seven mydriatic images covering all four quadrants (online supplemental table 5).30

The patient-level DR grade was based on the worse DR grade of the two eyes. If both eye images of a patient are classified as ungradable, then the patient is classified as ungradable. If only one eye image is classified as gradable, then the patient-level DR grading is based on this eye. If a patient has only one eye image, and it is classified as ungradable, then this patient is classified as ungradable.

DME severity classification module

The DME severity classification module provides subjects each fundus image to a three-stage DME severity estimation that can be further transferred to multiple binary classifications. As DME assessment, which requires retinal thickness information is not possible in non-mydriatic fundus images, the presence of hard exudates is regarded as a presumptive diagnosis of DME (online supplemental table 6).

DR was defined as presence of mild NPDR or worse; referable DR, moderate NPDR or worse and VTDR, severe NPDR or worse and/or CSME.

Expert ground truth grading

The ground truth for fundus image diagnosis was provided by a two-stage reading by specialist graders. The grading team was led by the Ophthalmology Center of the Shanghai General Hospital (National Clinical Research Center for Eye Diseases). All graders were ophthalmologists from tertiary hospitals with 3 years or more of work experience. Each grader finished two rounds of training and passed a qualification test following ICDR guidelines. Graders were divided into primary graders and reviewers (senior graders) based on their seniority and performance. The grading was conducted in two stages.

Stage 1

Two primary graders read the fundus image and gave image quality grades and DR grades independently. If the two primary graders reached a consensus on both the image quality and DR grades, the grading of this fundus image ended in stage 1 and the grades served as the ground truth.

Stage 2

A reviewer (senior grader) who could access the assessments of both primary graders’ was added to the grading process if the two primary graders disagreed on either the image quality or DR grades. The reviewer’s sole opinion served as the final grade for such cases (figure 1).

Statistical analysis

Statistical analyses were performed with the use of SPSS V.22.0 (Chicago, Illinois, USA). Data were provided in the form of the mean and SD for continuous variables, or the number with the percentage for categorical variables. The prevalence (95% CIs) of DR, referable DR and VTDR were estimated overall and compared within subgroups of sex, age, categories of diabetes duration and glycated hemoglobin (HbA1c) with the χ2 test. The demographic and clinical characteristics were assessed and compared by sex with the χ2 test for categorical variables, and with the Student’s t-test for continuous variables.

Fundus images from one-third of randomly selected participants were used for DL algorithm validation. The ground truth of fundus image diagnosis provided by the expert panel is considered as the reference standard. The accuracy of DR grading, image quality and two-category derivatives (one DR grading or worse) of patients with diabetes were evaluated. The consistency and the accuracy among the DL algorithm and reference standard, the primary graders in the expert panel and the primary grader and reference standard were analyzed; 2×2 tables were generated to analyze the sensitivity, specificity, negative predictive value and positive predictive value of the DL algorithm in detecting DR, referable DR and severe NPDR or worse, as well as the image quality compared with the reference standard at the individual eye level. Consistency evaluations of the five-stage grading confusion matrix by kappa index and quadratic weighted kappa scores were also calculated. All p values were two-tailed and a p value <0.05 was considered statistically significant.

Results

Clinical characteristics of all the participants

In total, 47 269 participants with diabetes from 155 centers were enrolled in the present study, among which 27 110 (57.4%) were men (table 1 and figure 2). The mean (SD) age of all the participants was 54.29 (11.60) years, the mean diabetes duration was 6.80 (6.71) years and the mean HbA1c was 9.06 (2.27) % or 75.45 (24.85) mmol/mol. Since 97.92% of the participants had type 2 diabetes (1.61% type 1 diabetes, 0.32% gestational diabetes and 0.14% others, totaling 99.99% due to rounding), no further analysis was performed based on the diabetes classification.

Table 1

Clinical characteristics of the study participants

Figure 2

Geographic distribution of the 155 metabolic management centers in China involved in this study.

DL algorithm validation

A total of 31 498 images from one-third (No.=15 805) of the randomly selected participants were used for DL algorithm validation (figure 1).

For image quality assessment, from these images, 26 698 (84.8%) images were assessed as gradable by the reference standard (online supplemental table 7). Compared with the reference standard, the QC module had a 63.3% (95% CI: 61.9% to 64.7%) sensitivity and 85.0% (95% CI: 84.6% to 85.4%) specificity, with positive predictive value 43.2% (95% CI: 42.0% to 44.3%) and negative predictive value 92.8% (95% CI: 92.4% to 93.1%), respectively. The interobserver variability (setting one grader as reference standard) between two primary expert graders had a sensitivity of 69.6% (95% CI: 68.4% to 70.8%) and a specificity of 86.8% (95% CI: 86.4% to 87.2%), with positive predictive value 53.1% (95% CI: 51.9% to 54.2%) and negative predictive value 93.0% (95% CI: 92.7% to 93.3%).

For DR grading, the concordance between the DL algorithm and reference standard was 83.0% for the five-stage DR grading. The corresponding quadratic weighted kappa were 0.72 (95% CI: 0.72 to 0.72) (online supplemental table 8 and online supplemental figure 2). The DL algorithm had an 83.3% (95% CI: 81.9% to 84.6%) sensitivity and 92.5% (95% CI: 92.1% to 92.9%) specificity for detecting referable DR. The positive and negative predictive values were 61.8% (95% CI: 60.3% to 63.3%) and 97.4% (95% CI: 97.2% to 97.7%), respectively. The Youden index was 75.8%. For two-stage manual grading, the concordance for the five-stage DR grading between the two primary graders, and between the primary graders and the reference standard were 84.3% and 91.0%, respectively. The corresponding quadratic weighted kappa were 0.74 (95% CI: 0.74 to 0.74) and 0.87 (95% CI: 0.87 to 0.87), respectively. The concordance between the DL algorithm and primary grader 1, primary grader 2 or one primary grader (combined two primary graders) were 82.8%, 81.8% and 82.3%, respectively. The corresponding quadratic weighted kappa were 0.66 (95% CI: 0.66 to 0.66), 0.67 (95% CI: 0.67 to 0.67) and 0.67 (95% CI: 0.67 to 0.67), respectively (online supplemental table 8). Confusion matrices of the five-stage DR evaluation between the two primary graders, and between the primary graders and the reference standard are reported in online supplemental figures 3 and 4.

Typical examples of false negative and false positive cases of DL QC and the grading module are shown in online supplemental figures 5 and 6.

AI-enabled DR screening

In total, 94 199 fundus images from all the participants were graded by the DL algorithm. Among all the images, 22 404 (23.8%) images were assessed as high quality, 49 566 (52.6%) as medium quality and 22 229 (23.6%) as low quality (ungradable) by the QC module (online supplemental table 9). Thus, a total of 71 970 (76.4%) images from 40 665 (86.0%) participants were finally qualified for DR grading by the DL algorithm (online supplemental tables 9 and 10). The ungradable images were mainly due to small pupil size or the presence of cataracts or other rare eye diseases and camera operation problems (online supplemental table 11).19 21 22 31–33 Participants with ungradable images were recommended to the ophthalmology department for further examination.

Among the 40 665 gradable participants, the estimated prevalence of DR was 28.8% (95% CI: 28.4% to 29.3%), referable DR was 24.4% (95% CI: 24.0% to 24.8%) and VTDR was 10.8% (95% CI: 10.5% to 11.1%) (table 2). When analyzed by risk factor stratifications, the estimated prevalence of DR was higher in women 29.6% (95% CI: 28.9% to 30.3%), than in men, 28.3% (95% CI: 27.7% to 28.8%) (p=0.0029). The estimated prevalence of DR increased with age and duration of diabetes (both p values for trend <0.0001). Similar results were found in referable DR and VTDR in the stratification of these risk factors. Furthermore, by the HbA1c stratification, when HbA1c was <10.0% (85.77 mmol/mol), the prevalence of DR and referable DR increased with the raise of HbA1c (both p values for trend <0.0001), but decreased slightly without statistical significance when the HbA1c was 10.0% or higher (both p values >0.05). The prevalence of VTDR increased constantly with the raise of HbA1c (p value for trend <0.0001) (table 2, and online supplemental tables 12 and 13, and online supplemental figure 7).

Table 2

Prevalence of diabetic retinopathy (DR), referable DR and vision-threatening DR (VTDR) in total and among different risk factor stratification

The five-stage DR grading and corresponding DME classification results by the DL algorithm for 40 665 gradable participants are shown in online supplemental table 14. The percentage of ungradable images and the DR grading results based on different types of cameras were listed in online supplemental tables 15 and 16.

Discussion

In this large multicenter, real-world DR screening program, a DL-based AI system was deployed at 155 diabetes centers. Our study demonstrated that, in Chinese adults with diabetes, the estimated prevalence for any DR, referable DR and VTDR was 28.8%, 24.4% and 10.8%, respectively. The high prevalence of DR in various stages indicated the importance and urgency of early detection of DR in China. A DL system with comparable sensitivity and specificity to a panel of specialists enabled the efficient screening for DR at diabetes centers nationwide, and it may provide a solution to this problem.

Screening for DR in daily clinical work has not yet been well established at diabetes centers in China due to resource, infrastructure and retinal specialist limitations. Therefore, a comprehensive survey on DR prevalence and its actual burden in the whole country remains unaddressed.6 Highly demanded at every diabetes center is the timely diagnosis and treatment of DR in order to achieve better outcomes over the widest diabetic population regardless of geographic and economic barriers.

Epidemiological studies published in the recent 10 years have demonstrated the prevalence of DR in China ranged from 5.4% to 44.8% in patients with diabetes.6–8 34 The variability of DR prevalence in different studies was mainly due to the heterogeneity among the studies, including sample size, study design, clinical characteristics of participants, geographic region and DR classification criteria. A recent meta-analysis, which collected data from 31 community-based studies, showed that the pooled prevalence of any DR in DM participants was 18.45%, for NPDR it was 15.06% and for PDR it was 0.99%.6 However, a single survey that reports the actual prevalence of DR in the whole country is lacking. In the present study, a large multicenter DR screening program, implemented with the aid of AI technology, was conducted in 26 provinces in China. The survey has provided the most up-to-date information on DR characteristics in adults with diabetes and has indicated a high prevalence of DR in China. In addition, through stratification, the crude prevalence of DR was higher in older age groups and, together with the societal aging, it increases the burden to the healthcare system. However, since the prevalence of DR was decreased in subgroups with lower degrees of HbA1c, it may predict a better glycemic control with the lessening of eye complications.

Most DL-based DR grading studies have focused on the methodology development and validation using high-quality, curated public datasets.17–19 The implementation of automated DL algorithms for DR screening in real-world practice was rare.20 23 24 35 One example was the large community-based, nationwide DR screening program using DL algorithm in Thailand.23 Another two examples in its use in clinical settings were performed by Gulshan et al and van der Heijden et al, respectively.20 24 The former study involved 3049 patients with diabetes in two eye care clinics in India.20 The results demonstrated 88.9% and 92.1% sensitivities, and 92.2% and 95.2% specificities for the detection of moderate or worse DR in the two clinics, respectively. The latter was performed in the Hoorn diabetes center including 1415 patients which reported a 68.0% sensitivity and 86.0% specificity for detecting referable DR by the IDx-DR device based on ICDR standard, compared with adjudicated reference standard by a panel of three experts; the averaged sensitivity and specificity of the three experts against the adjudicated reference standard were 74.7% and 99.7%, respectively; however, the quality of the fundus images collected was unsatisfactory, which may be due to the implementation of the study in the non-ophthalmic specialized clinical setting.24 These studies offered good examples and indicated the feasibility and validity of DL implementation in real-world clinical work flows. However, in these studies, the DL algorithms were deployed only at individual centers with small or moderate sample size. The wide deployment of DL-based systems to multiple non-ophthalmic specialized medical centers or healthcare systems with different resources remains unclear.

Therefore, in the present study we applied a DL algorithm for DR screening at 155 diabetes care centers involving 47 269 patients with diabetes in China. A variety of fundus cameras meeting the base requirements for photograph acquisition were used. The DL algorithms provided a five-stage DR severity grading and DME detection in a real-time manner. The DL system was integrated with various fundus camera models used in MMCs, allowing seamless, push-button image QC and DR staging onsite. None of the deep neural networks in the DL system was trained or fine tuned using any MMC images, demonstrating strong domain transfer and generalization capability, as well as robustness and reproducibility on unseen images. The sensitivity for detecting referable DR was 83.3%, and the specificity was 92.5%, with an Youden index of 75.8%. The performance of the DL system is comparable to the interobserver variability of specialists who are limited in availability (1.1 hour/day on average) and have a long response time (1.5 days on average) in real-world practice. The high specificity (92.5%) performance of the DL system in detecting referable DR may be used as a safe and low-false-alarm autonomous referral decision, that is, all patients classified as referable DR by the algorithms are referred to specialists without further manual review. The algorithms were trained on datasets collected from different populations and scenarios, and they show good generalization characteristics. Furthermore, in order to evaluate the effects of the QC module of the DL system, the quality assessment results obtained by the algorithm and by the reference standard were compared. Although low-quality images were inevitable in non-ophthalmic clinical settings, by enabling AI QC feedback in the image acquisition phase, the proportion of qualified images could reach 92.8% of all the fundus images acquired according to the negative predictive value of the QC model, together with strengthening the training process on technician’s operation skills (ie, distinguishing patients with small pupil or cataracts, and improving image contrast or focus issues), the percentage of low-quality images will reduce to the least extent and lead to more reliable subsequent DR grading in the future work.

There are several strengths in the present study. First, it was conducted at 155 diabetes centers in China. The study results were representative because the involved MMCs were in the hospitals with different levels according to tiered medical service system and in the regions with different economic and culture background. Furthermore, the study sample size was large and enrolled consecutive patients with proper sex ratio, wide distribution of age, diabetes duration and metabolic control situation which mimics the characteristics of diabetes in the real-world situation. Second, it was a large AI-enabled DR screening program, with comparable performance to specialists. The automated DL system proved to be a scalable solution given the markedly increased diabetes prevalence and relatively inadequate medical resources in China, so as to perform effective screening of patients at diabetes centers that diagnose and manage the majority of patients with diabetes. In addition, the image QC module has significantly increased the validity and accuracy of DR screening, which enables the regular screening of DR in non-ophthalmic clinical settings.

The study has several limitations. First, since the study was conducted at multiple clinical centers, even with the large sample size, the DR prevalence was not commensurate to that of the general population. Second, the estimated prevalence of DR (27.57%) and referable DR (16.59%) by the reference standard in one-third of the randomly selected participants were relatively lower than those by the AI screening. The higher negative predictive values, but the lower positive predictive value might lead to an overestimate of DR prevalence by the DL algorithm. While, the other factors, including only one single non-mydriatic fundus photography instead of multifield fundus photography were obtained might underestimate the DR prevalence by the DL algorithm. In addition, there were disagreements between the human graders and the DL QC model. The typical example of false negative result was the out of focus image judged as ungradable by the algorithm but gradable by the graders, while the false positive result was the too dark image judged as ungradable by graders but gradable by the algorithm (online supplemental figure 5). For all the above reasons, one should be cautious in interpreting the current findings.

In conclusion, in the present study, we validated the feasibility and accuracy of an automated DL algorithm in DR screening and surveyed the prevalence of DR, referable DR and VTDR at 155 diabetes centers in China. With comparable performance to human specialists and scalability, the automated system may offer an effective, cost-efficient and practical screening in routine diabetes follow-up and retinal complication management. More diabetes centers and primary care facilities are now joining the program to improve and validate the screening and referral procedures, thereby endeavoring to mitigate the public health problem.

Acknowledgments

The authors would like to thank all study participants and participating centers. We thank Drs Yufan Wang (Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China), Weijiang Chu (Laizhou Municipal Hospital, Shandong Province, China), Lin Zhang (Bayannur Hospital, Inner Mongolia Autonomous Region, China), Yanmei Yu (Mudanjiang Cardiovascular Hospital, Heilongjiang Province, China), Xingjian Zhou (Xiangyang No.1 People’s Hospital, Hubei Province, China), Hongmei Qiu (People’s Hospital of Yuxi City, Yunnan Province, China), Wenbing Ai (Yiling Hospital of Yichang, Hubei Province, China), Xueqin Wang (First People’s Hospital of Nantong, Jiangsu Province), Zhiqiang Kang (ZhengZhou Central Hospital, Henan Province, China), Xiaowen Chen (Huangshi Central Hospital, Hubei Province, China), Chunlei Deng (People’s Hospital of Ningxiang, Hunan Province, China), Mingfu Ma (The Fifth People’s Hospital of Qinghai Province, Qinghai Province, China), Hong Yang (Ruian People’s Hospital, Zhejiang Province, China), Huige Shao (Changsha Central Hospital, Hunan Province, China), Shen Qu (Shanghai Tenth People’s Hospital, Shanghai, China), Feixia Shen (The First Affiliated Hospital of Wenzhou Medical University, Zhejiang Province, China), Bangqun Ji (Xingyi People’s Hospital, Guizhou Province, China), Jianling Du (The First Affiliated Hospital of Dalian Medical University, Liaoning Province, China), Riqiu Chen (Lishui City People’s Hospital, Zhejiang Province, China), Wei Tang (Geriatric Hospital of Nanjing Medical University, Jiangsu Province, China), Xueyuan Jia (Dujiangyan Medical Center, Sichuan Province, China), Laixiang Li (Qinan County Hospital of Traditional Chinese Medicine, Gansu Province, China), Lin Yuan (Zhuhai People’s Hospital, Guangdong Province, China), Yongjun Chen (Hongze Huaian District People’s Hospital, Jiangsu Province, China), Ning Xu (The First People’s Hospital of Lianyungang, Jiangsu Province, China), Lin Gao (Affiliated Hospital of Zunyi Medical University, Guizhou Province, China), Canli Gu (Ruyang People’s Hospital, Henan Province, China), Zhaoli Yan (The Affiliated Hospital of Inner Mongolia Medical University, Inner Mongolia Autonomous Region, China), Wenzhi Zhang (Qixia People’s Hospital), Shandong Province, China), Bingyin Shi (The First Affiliated Hospital of Xi'an JiaoTong University, Shanxi Province, China), Hongyan Deng (Puai Hospital, Hubei Province, China), Jie Shen (The Third Affiliated Hospital of Southern Medical University, Guangdong Province, China), Anhua Huang (Renhuai City People Hospital, Guizhou Province, China), Feng Wei (Binzhou People’s Hospital, Shandong Province, China), Yufang Gao (Zuozhou City Hospital, Hebei Province, China), Jinhong Chen (People’s Hospital of Hengxian County, Nanning, Guangxi, China), Yu Zhao (Baoan Central Hospital of Shenzhen, Guangdong Province, China), Xinhua Ye (Changzhou No.2 People’s Hospital, Jiangsu Province, China), Weici Xie (The First People’s Hospital of Tianmen in Hubei Province, China), Jinsong Kuang (The Fourth Hospital of People, Shenyang, Liaoning Province, China), Yan Feng (Mudanjiang City Second People’s Hospital, Heilongjiang Province, China), Yunping Zhang (The People’s Hospital of Xiangyun, Yunnan Province, China), Wei Zhu (Sucheng Cai Town Hospital, Jiangsu Province, China), Shiwei Cui (Affiliated Hospital of Natong University, Jiangsu Province, China), Zunhai Zhou (Yangpu Hospital,Tongji University, Shanghai, China), Xiaoqing Su (Jiangxi Pingxiang People’s Hospital, Jiangxi Province, China), Yu Shi (Qidong People’s Hospital, Jiangsu Province, China), Xiaoju Qi (The People’s Hospital of Yunxian, Yunnan Province, China), Jie Yang (Fujian Jianou Hospital, Fujian Province, China), Jianying Pu (Shanghai Gonghui Hospital, Shanghai, China), Ping Tang (Shenzhen Luohu Hospital Group, Guangdong Province, China), Rongyue Chen (Third People’s Hospital of Xuchang, Henan Province, China), Yingli Pan (Fangda Medical Yingkou People’s Hospital, Liaoning Province, China), Jinhua Qiu (People’s Hospital of Xinfeng County, Jiangxi Province, China), Liwei Qiu (Anyang Hospital of Traditional Chinese Medicine, Henan Province, China), Hui Cao (First People’s Hospital of Shangqiu, Henan Province, China), Xurong Jia (People’s Hospital of Rongshui Miao Autonomous County, Guangxi, China), Shaofang Wang (The People’s Hospital of Anyang City, Henan Province, China), Jun Liao (People’s Hospital of Ruijin City, Jiangxi Province, China), Xiaomin Xie (The First People’s Hospital of Yinchuan, Ningxia Hui Autonomous Region, China), Zhihong Zeng (Longquan People’s Hospital, Zhejiang Province, China), Yiyuan Yao (First People’s Hospital of Xiushui County, Jiangxi Province, China), Xiaoshu Wang (West China-Guang'an Hospital, Sichuan University, Sichuan Province, China), Huiju Zhong (Xiangya Changde Hospital, Hunan Province, China), Jialin Xia (Guixi People’s Hospital, Jiangxi Province, China), Xiujun Yan (First People’s Hospital of Guannan County, Jiangsu Province, China), Sha Gan (Fengqing County People’s Hospital of Yunnan, Yunnan Province, China), Lianzeng Sun (Shandong Energy Zibo Mining Group Co., Ltd Central Hospital, Shandong Province, China), Bo Zhang (The People’s Hospital of Shimen County, Hunan Province, China), Dadong Fei (Zaozhuang Municipal Hospital, Shandong Province, China), Lianhuan Zhang (Shaoxing Hospital of Traditional Chinese Medicine, Zhejiang Province, China), Hui Zheng (People’s Hospital of Wulian County, Shandong Province, China), Shan Dong (First People’s Hospital of Qingzhen, Guizhou, Guizhou Province, China), Bin Liu (Nuclear industry Beijing 401 Hospital, Beijing, China), Xianchen Liu (Chifeng City Center Hospital Ningcheng County, Inner Mongolia Autonomous Region, China), Bi Lu (Aoyang Hospital, Jiangsu Province, China), Ling Gao (Xiangyang Central Hospital, Hubei Province, China), Xuejian Ni (Taiping Street Community Health Service Center, Suzhou Xiangcheng District, Jiangsu Province, China), Xiangning Sun (Central Hospital of Qinghe County, Hebei Province, China), Qian Zhang (The Second Affiliated Hospital of Guizhou Medical University, Guizhou Province, China), Laijun Qiao (Gongyi City People’s Hospital, Henan Province, China), Hongjun Fu (Taizhou Enze Medical Center Luqiao Hospital, Zhejiang Province, China), Jingwen Gan (Liyuan Community Health Service Center, Tongzhou District, Beijing, China), Haiying Niu (Luquan People’s Hospital, Hebei Province, China), Cuirong Wu (Shenzhen Zhonghai Hospital, Guangdong Province, China), Libo Chen (Shenzhen Nanshan Hospital, Guangdong Province, China), Zhiyuan Yang (Luoyang Central Hospital Affiliated to Zhengzhou University, Henan Province, China), Mingjun Gu (Shanghai Pudong Gongli Hospital, Shanghai, China), Xiaoyan Shi (PKUCare Luzhong Hospital, Shandong Province, China), Rong Li (Chongzhou People’s Hospital, Sichuan Province, China), Chunxiao Shi (People’s Hospital of Anshun City, Guizhou Province, China), Xu Lian (Hongqi Hospital Affiliated to Mudanjiang Medical University, Heilongjiang Province, China), Fengshi Tian (Tianjin 4th Centre Hospital, Tianjin, China), Yugang Hu (Chaozhou People’s Hospital, Guangdong Province, China), Lingling Xu (Shenzhen Hospital of Southern Medical University, Guangdong Province, China), Jianbo Yun (Weitang Health Center, Xinbei District, Changzhou, Jiangsu Province, China), Wangjun Chen (Taicang Shaxi People’s Hospital, Jiangsu Province, China), Weiyuan Huang (Jiangyin Harbour Hospital, Jiangsu Province, China), Yun Liang (People’s Hospital of Fengdu County, Chongqing, China), Tao Yang (Jiangsu Province Hospital, Jiangsu Province, China), Angui Yang (Zhongxiang People’s Hospital, Hubei Province, China), Weiping Tu (Shaoxing Shangyu People’s Hospital, Zhejiang Province, China), Yaoming Xue (Nanfang Hospital, Southern Medical University, Guangdong Province, China), Zheng Li (Daxing Xihongmen Hospital, Beijing, China), Pengqiu Li (Sichuan Academy of Medical Sciences·Sichuan Provincial People’s Hospital, China), Xiaopang Rao (Qingdao Chengyang People’s Hospital, Shandong Province, China), Li Yan (Sun Yat-sen Memorial Hospital,Sun Yat-sen University, Guangdong Province, China), Guiyang Liu (People’s Hospital of Renshou County, Meishan, Sichuan Province, China), Junqiang Ba (The First People's Hospital of Zunyi, Guizhou Province, China), Yezi Sun (Zhangjiagang First People’s Hospital, Jiangsu Province, China), Zhewu Yin (Fu Ning People’s Hospital, Jiangsu Province, China), Wenbing Xu (People’s Hospital of Yungang District, Datong City, Shanxi Province, China), Xiongwei Dong (Fangsong Street Community Health Service Center, Songjiang District, Shanghai, China), Wei Wang (Xiang'an Hospital of Xiamen University, Fujian Province, China), Xiaotai Jin (Xinrui hospital, Wuxi New District, Jiangsu Province, China), Binbin Tian (Yuzhou City People’s Hospital, Henan Province, China), Zhigang Zhao (Zhengzhou Yihe Hospital, Henan Province, China), Zuhua Gao (Taizhou Hospital of Zhejiang Province, China), Chunlong Mei (Longhua County Hospital, Hebei Province, China), Qiaoyun Qian (Dongtai Hospital of Traditional Chinese Medicine, Jiangsu Province, China), Yunxia Chen (Cangzhou People’s Hospital, Hebei Province, China), Peng Su (Tongnan District People’s Hospital, Chongqing, China), Jingze Huang (Pingtan Comprehensive Experimental Area Hospital, Fujian Province, China), Hongxia Tang (Zhangjiakou First Hospital, Hebei Province, China), Tongfu Bian (Yancheng Bufeng Central Health Center, Jiangsu Province, China), Xuefeng Li (Affiliated Taihe Hospital of Hubei University of Medicine, China), Guiying Wang (The Fifth People’s Hospital of Datong, Shanxi Province, China), Ziqi Zhao (Liaoning Lida Diabetes Hospital, Liaoning Province, China), Guoqi Yang (Second People’s Hospital of Yandu District, Jiangsu Province, China), Chunfang Qian (Chedun Town Community Health Service Center, Songjiang District, Shanghai, China), Yong Dai (People’s Hospital of QingXian, Hebei Province, China), Yaxiong Shi (The Second Affiliated Hospital of Fujian Medical University, Fujian Province, China), Fuzai Yin (First Hospital of Qinhuangdao, Hebei Province, China), Xuemin Li (Handan Seventh hospital, Hebei Province, China), Wei Wang (Xinbang Town Community Health Service Center, Songjiang District, Shanghai, China), Yane Liu (Shanxian Central Hospital, Shandong Province, China), Xiaohua Li (Shanghai Seventh People’s Hospital, Shanghai, China), Yanling Feng (The First People’s Hospital of Jinzhong, Shanxi Province, China) for their collection of data and taking care of patients.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • YZ, JS, YiP, ZhZ, QZ and ZW are joint first authors.

  • Contributors WW, XX, XD and GN conceived and design the study. YZ, JS, YiP, ZhZ, ZW and SJ analyzed the data. JS, YiP, QZ, LY, DZ, HJ, YD, BS, PG, HS, QW, YoP, JL, LH, TK, LC, FX and QD contributed to data collection. KQ organized the expert panel for manual grading. KL and XX directed the fundus image diagnosis by specialist graders. XD, ZiZ, KQ, ZW and SJ were involved in the development, optimization and verification of the DL Algorithms. YZ, JS, YiP, ZhZ, QZ, ZW, XD and DT drafted and revised the manuscript. WW, XD and YZ approved the final version of the manuscript. WW and XD are the guarantors of this work and, as such, had full access to all the data in the study and take responsibility for the decision to submit for publication.

  • Funding This research was supported by grants from National Key R&D Program of China (2016YFC0901200, 2018YFC1314800); Chinese Academy of Engineering (2019-XZ-42); the National Natural Science Foundation of China (81670797); the Program for Shanghai Outstanding Medical Academic Leader (2019LJ07); the Youth Program of Shanghai Municipal Health and Family Planning Commission (20174Y0081) and the Yang Fan Project of Shanghai Science and Technology Committee (19YF1442700).

  • Map disclaimer The depiction of boundaries on the map(s) in this article does not imply the expression of any opinion whatsoever on the part of BMJ (or any member of its group) concerning the legal status of any country, territory, jurisdiction or area or of its authorities. The map(s) are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Ethics approval The study protocol was approved by the ethics committees at Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, the leading MMC center (KY2018-103-3) and at the other participating centers subsequently if necessary. This study complied with the provisions of the Declaration of Helsinki. All study participants provided written informed consent.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Data are available on reasonable request.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.