Objective Diagnosis codes might be used for diabetes surveillance if they accurately distinguish diabetes type. We assessed the validity of International Classification of Disease, 10th Revision, Clinical Modification (ICD-10-CM) codes to discriminate between type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM) among health plan members with youth-onset (diagnosis age <20 years) diabetes.
Research design and methods . Diabetes case identification and abstraction of diabetes type was done as part of the SEARCH for Diabetes in Youth Study. The gold standard for diabetes type is the physician-assigned diabetes type documented in patients’ medical records. Using all healthcare encounters with ICD-10-CM codes for diabetes, we summarized codes within each encounter and determined diabetes type using percent of encounters classified as T2DM. We chose 50% as the threshold from a receiver operating characteristic curve because this threshold yielded the largest Youden’s index. Persons with ≥50% T2DM-coded encounters were classified as having T2DM. Otherwise, persons were classified as having T1DM. We calculated sensitivity, specificity, positive and negative predictive values, and accuracy overall and by demographic characteristics.
Results According to the gold standard, 1911 persons had T1DM and 652 persons had T2DM (mean age (SD): 19.1 (6.5) years). We obtained 90.6% (95% CI 88.4% to 92.9%) sensitivity, 96.3% (95% CI 95.4% to 97.1%) specificity, 89.3% (95% CI 86.9% to 91.6%) positive predictive value, 96.8% (95% CI 96.0% to 97.6%) negative predictive value, and 94.8% (95% CI 94.0% to 95.7%) accuracy for discriminating T2DM from T1DM.
Conclusions ICD-10-CM codes can accurately classify diabetes type for persons with youth-onset diabetes, showing promise for rapid, cost-efficient diabetes surveillance.
- type 1 diabetes mellitus
- type 2 diabetes mellitus
- electronic health records
- international classification of diseases
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
- type 1 diabetes mellitus
- type 2 diabetes mellitus
- electronic health records
- international classification of diseases
Significance of the study
What is already known about this subject?
Currently, diabetes surveillance in youth is conducted by active surveillance, which is slow, labor-intensive, and expensive.
What are the new findings?
International Classification of Disease, 10th Revision, Clinical Modification (ICD-10-CM) codes from the electronic health records of a large integrated healthcare delivery system can be used to accurately discriminate between type 1 and type 2 diabetes among people with youth-onset (diagnosis age <20 years) diabetes.
How might these results change the focus of research or clinical practice?
The finding that ICD-10-CM codes can accurately classify type 1 and type 2 diabetes in youth and young adults diagnosed with diabetes before age 20 years supports the use of ICD-10-CM codes for rapid and cost-efficient diabetes surveillance. Discriminating between diabetes types is an important component of diabetes surveillance.
In the USA, national diabetes surveillance commonly uses information from surveys such as the National Health and Nutrition Examination Survey (NHANES), the Behavioral Risk Factor Surveillance System (BRFSS), and the National Health Interview Survey (NHIS).1 2 These sources do not identify diabetes type, capture a limited number of pediatric diabetes cases (NHANES and NHIS), or do not collect data in persons aged <18 years (BRFSS). The SEARCH for Diabetes in Youth (SEARCH) study conducts active surveillance of youth-onset (diagnosis age <20 years) physician-diagnosed diabetes and reported trends in diabetes incidence and prevalence.3–5 However, active surveillance is labor intensive and expensive. Using electronic health record (EHR) information might be a potential cost-efficient long-term surveillance approach for childhood diabetes.6
Previous studies have investigated the performance of International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) codes in determining diabetes type in youth and adults. Among youth, studies used information in the EHR of a large integrated healthcare delivery system6 and academic health centers7 8 to investigate the utility of ICD-9-CM codes alone or in combination with medications and laboratory values to distinguish between type 1 diabetes mellitus (T1DM) and type 2 diabetes mellitus (T2DM). The criteria that performed the best for T1DM and T2DM included ICD-9-CM codes only. Criteria that used medications alone, such as insulin, metformin, or glucagon, performed worse than ICD-9-CM codes alone, and adding medication and laboratory results to ICD-9-CM codes did not improve performance.6 7 Consistent with findings for youth, studies in adults showed that adding medications (insulin, glucagon, oral hypoglycemics, and metformin) to ICD-9-CM codes did not improve classification of diabetes type.9 10
Although ICD-9-CM codes have been shown to accurately distinguish between diabetes types, the utility of newer ICD-10-CM codes, which include more detailed codes describing the severity and complexity of diabetes, has not been investigated.
We evaluated the validity of using ICD-10-CM codes to discriminate between T1DM and T2DM among persons with youth-onset diabetes.
Research design and methods
Active surveillance of youth-onset diabetes has been conducted at Kaiser Permanente Southern California (KPSC) a large integrated healthcare delivery system, since 2001 as part of the SEARCH Registry Study. We used information from Kaiser Permanente Southern California (KPSC), a large integrated health carehealthcare delivery system. KPSC members are representative of southern California’s population11 and receive outpatient, inpatient, emergency department, urgent care, pharmacy, and laboratory services. Member utilization of services (encounters) is stored in the EHR system. In accordance with the SEARCH Registry protocol, KPSC members newly diagnosed with diabetes from January 1, 2002 through the present were registered as incident diabetes cases; prevalent cases were registered in 2001 and 2009.12 Through November 30, 2016, the cut-off date for inclusion in these analyses, 4915 KPSC members with diabetes (incident and prevalent cases) diagnosed before age 20 years were included in the SEARCH Registry Study (figure 1). Of these, 3100 were KPSC members on October 1, 2015, when ICD-10-CM coding was implemented. We identified persons with healthcare encounters during October 1, 2015–November 30, 2016 and restricted the analyses to persons with T1DM or T2DM (since these are the most common forms of diabetes) with ≥1 diabetes ICD-10-CM code from clinic-based encounters. Using this approach, we excluded 36 persons without healthcare encounters, 166 persons with encounters occurring exclusively outside clinic-based settings (eg, virtual, home care), 167 persons with encounters without diabetes ICD-10-CM codes recorded, 30 persons with diabetes types other than T1DM or T2DM, and 138 persons whose diabetes type was recorded as unknown based on the SEARCH protocol for ascertainment of diabetes type.
Gold standard for diabetes type
Based on the SEARCH study protocol, the gold standard for diabetes type is the physician-assigned diabetes type documented in the progress notes of patients’ medical records within 6 months of diagnosis for incident cases and in the prevalent year for prevalent cases.7 If patients saw more than one healthcare provider during that period, then the type assigned by their endocrinologist was recorded.
ICD-9-CM and International Classification of Disease, 10th Revision, Clinical Modification (ICD-10-CM)
Diabetes ICD-10-CM codes
Diabetes ICD-10-CM codes were obtained from the EHRs for healthcare encounters during the study period. Diabetes ICD-10-CM diagnosis codes correspond to the following diabetes types: E10 codes are for T1DM; E11 codes for T2DM; E08–E09 for secondary diabetes; E13 for other specified diabetes mellitus including secondary diabetes not otherwise classified, and P70.2 for neonatal diabetes.
Other patient characteristics
Age on October 1, 2015, was calculated from date of birth. While all persons were <20 years old at time of diabetes diagnosis when they were registered for the SEARCH study, some were aged ≥20 years by October 1, 2015, when KPSC implemented ICD-10-CM coding. We included all persons, regardless of age in 2015, because we were interested in the performance of ICD-10-CM codes for diabetes type in both youth and adults. Race/ethnicity was categorized into Hispanic (regardless of race), Asian/Pacific Islander, non-Hispanic black, non-Hispanic white, and other race or unknown race/ethnicity. We obtained measured weight and height from the EHR for the index encounter (first encounter with ICD-10-CM diagnosis code of diabetes) if available. Otherwise, for persons aged <18 years at the index encounter, we obtained height and weight data within 61 days of the index encounter. For persons aged ≥18 years at the index encounter, we obtained weight within 183 days of the index encounter and height measured any time after age 18 years and closest to the index encounter. Weight and height were used to calculate body mass index (BMI), categorized as underweight, normal weight, overweight, or obese. For each encounter, we obtained the provider specialty (categorized as endocrinology or other).
Each person could have multiple healthcare encounters during the 14-month study period, and each encounter could have multiple diabetes codes. Diabetes ICD-10-CM codes were first summarized at the encounter level. We coded an encounter as T2DM if all codes within the encounter were T2DM and non-T2DM if there was ≥1 code for T1DM, secondary diabetes, or neonatal diabetes. We calculated the percent of all encounters that were T2DM coded for each person. A person with three T2DM-coded encounters and two non-T2DM-coded encounters would have 60% T2DM-coded encounters. We classified diabetes type using the percent of T2DM-coded encounters during the study period.
A receiver operating characteristic (ROC) curve was generated for all possible thresholds. We chose the threshold yielding the largest Youden’s index as the optimal threshold to maximize the sum of sensitivity and specificity.13 We classified persons with percent T2DM-coded encounters greater than or equal to the optimal threshold to have T2DM and persons with percent T2DM-coded encounters less than the threshold to have T1DM. Thus, T2DM sensitivity was equivalent to T1DM specificity, and T2DM specificity was equivalent to T1DM sensitivity. Maximizing the sum of T2DM sensitivity and specificity (Youden’s index) also maximizes the sum of T1DM sensitivity and specificity.
Using the optimal threshold, we calculated sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy ([true positives+true negatives]/all persons) and their 95% CIs overall and by groups (age, race/ethnicity, BMI, and number of encounters). We stratified analyses by age <20 years (youth) and ≥20 years (adults) on October 1, 2015, to be consistent with the SEARCH Registry Study categorization of youth as age <20 years. We also stratified by healthcare provider specialty at the encounter level. We generated ROC curves and calculated areas under the curves (AUCs) for each group. We chose not to conduct formal statistical testing for groups because the analyses were exploratory in nature and sample sizes for some groups were small.
This study was approved by the KPSC institutional review board.
We identified 1911 persons with T1DM (74.6%) and 652 persons with T2DM (25.4%) from KPSC diagnosed at age <20 years who had at least one ICD-10-CM diabetes code from October 1, 2015 to November 30, 2016 (table 1). Age on October 1, 2015, ranged from 1.0 to 33.7 years (mean (SD)=19.1 (6.5) years). Weight and height were measured at the index encounter for 2231 (89.1%) and 2067 (81.9%) persons, respectively. For other persons, the majority had height or weight measured within 2 months of the index encounter. We were able to calculate BMI for 2487 (99.7%) persons in the study. Among persons with T2DM, 85.6% had encounters with T2DM codes only, 4.8% with T1DM codes only, 6.6% with T2DM and T1DM codes, and 3.1% with a mixture of T2DM, T1DM, secondary diabetes, or neonatal diabetes codes. Among persons with T1DM, 79.8% had encounters with T1DM codes only, 2.8% with T2DM codes only, 15.2% with T2DM and T1DM codes, and 2.2% with a mixture of T2DM, T1DM, secondary diabetes, or neonatal diabetes codes.
The ROC curve for classifying diabetes type had an AUC of 0.95 (95% CI 0.94 to 0.96). The threshold of ≥50% of T2DM-coded encounters yielded the largest Youden’s index and was selected as the optimal threshold. Using this threshold, we observed 90.6% (95% CI 88.4% to 92.9%) T2DM sensitivity (T1DM specificity), 96.3% (95% CI 95.4% to 97.1%) T2DM specificity (T1DM sensitivity), 89.3% (95% CI 86.9% to 91.6%) T2DM PPV (T1DM NPV), 96.8% (95% CI 96.0% to 97.6%) T2DM NPV (T1DM PPV), and 94.8% accuracy (95% CI 88.4% to 92.9%) (table 2).
The AUC for persons aged <20 years on October 1, 2015 was 0.98 (95% CI 0.97 to 0.99), compared with an AUC of 0.92 (95% CI 0.94 to 0.96) for persons aged ≥20 years on October 1, 2015 (table 2). Point estimates for sensitivity, specificity, PPV, NPV, and accuracy ranged from 92.5% to 98.0% for persons aged <20 years and from 86.0%–95.1% for persons aged ≥20 years.
Accuracy and AUC point estimates ranged from 92.0% to 97.4% and from 0.93 to 0.96, respectively, across race/ethnicity groups. Point estimates for T2DM sensitivity (T1DM specificity) was highest among Asian/Pacific Islanders (94.8%, 95% CI 89.1% to 100.0%) and lowest among non-Hispanic whites (84.2%, 95% CI 76.0% to 92.4%) (table 2). Point estimates for T2DM specificity (T1DM sensitivity) was highest among non-Hispanic whites (98.9%, 95% CI 98.1% to 99.7%) and lowest among Asian/Pacific Islanders (91.5%, 95% CI 85.1% to 98.0%).
Performance of ICD-10-CM codes varied across BMI category, number of encounters, and provider specialty. ICD-10-CM codes performed better in persons who were overweight/obese (AUC 0.95, 95% CI 0.94 to 0.96) compared with persons who were underweight/normal weight (AUC 0.84, 95% CI 0.75 to 0.92). For persons who were underweight or normal weight, T2DM sensitivity and PPV (T1DM specificity and NPV) were 65.1% (95% CI 50.9% to 79.4%) and 66.7% (95% CI 52.4% to 80.9%), respectively. In contrast, for persons who were overweight/obese, T2DM sensitivity and PPV were 92.5% (95% CI 90.4% to 94.6%) and 91.0% (95% CI 88.7% to 93.3%), respectively. This results from the high percentage of misclassification of persons with T2DM as T1DM among persons who were underweight/normal weight (15 of 43, 34.9%) compared with persons who were overweight/obese (44 of 587, 7.5%).
Persons with 1–3 encounters had T2DM sensitivity of 93.0% (95% CI 90.2% to 95.8%) and T2DM specificity of 94.7% (95% CI 92.9% to 96.4%). Persons with 4–6 encounters had T2DM sensitivity and specificity of 88.9% (95% CI 84.3% to 93.5%) and 97.1% (95% CI 95.9% to 98.4%), respectively. Persons with ≥7 encounters had T2DM sensitivity and specificity of 87. 9% (95% CI 82.8% to 93.0%) and 97.0% (95% CI 95.6% to 98.3%), respectively. AUC and accuracy were similar across groups with different number of encounters. The sensitivity of classifying diabetes type using endocrinology encounters and other encounters was 85.7% (95% CI 81.6% to 89.8%) and 91.2% (95% CI 88.8% to 93.5%), respectively. PPV using endocrinology encounters was 81.6% (95% CI 77.1% to 86.0%), and PPV using other encounters was 89.8% (95% CI 87.3% to 92.2%). The accuracy was approximately 95% for both.
Our findings provide evidence that ICD-10-CM codes can be used to accurately classify diabetes type among persons with youth-onset (diagnosed at age <20 years) T1DM or T2DM across a broader age range including youth and young adults. Using ≥50% T2DM-coded encounters to discriminate between T1DM and T2DM yielded an overall accuracy of 94.8% (95% CI 94.0 to 95.7), with sensitivity, specificity, PPV, and NPV near or above 90%.
ICD-10-CM codes performed well in most age and race/ethnicity groups and across a range of number of encounters. The observed differences in the point estimates for sensitivity and specificity between non-Hispanic whites and Asian/Pacific Islanders are potentially due to the small sample size for Asian/Pacific Islanders in the study. We observed accuracy ≥92% and AUC ≥0.92 in all but one group (stratified by age, race/ethnicity, BMI, and number of encounters). Classifying diabetes type was more difficult in persons who were underweight/normal weight. The fact that the preponderance of persons with T2DM were overweight/obese (93.2% in our study) might explain disproportionate assignment of T1DM ICD-10-CM codes to underweight/normal weight persons with T2DM. Using additional information such as BMI might improve diabetes classification, but this information might not be widely available or easily accessible in EHRs and administrative databases. In contrast, ICD-10-CM codes are available from EHRs and claims data across diverse care settings and geographic regions. Thus, our study focused on evaluating whether diagnosis codes alone could accurately distinguish between T1DM and T2DM. Moreover, some prior studies have reported that adding medication dispensing and laboratory test values to classification criteria with ICD-9-CM codes did not significantly improve the ability to distinguish diabetes type.6 7
More T2DM false negatives (T1DM false positives) among endocrinology encounters (14.3%) compared with other encounters (9.8%) contributed to the lower point estimates of sensitivity and PPV of endocrinology encounters compared with other encounters. There were fewer endocrinology visits for persons with T2DM per the gold standard (16.0%) than that of other visits (26.9%). Therefore, endocrinologists might be more likely than other providers to assign T1DM codes to persons with T2DM. However, the overall accuracies were approximately 95% for both groups. Moreover, overall assignment of diabetes type by ICD-10-CM codes used a combination of both types of encounters, lessening the influence of endocrinology encounters on the overall classification.
Compared with prior studies of ICD-9-CM codes, our present study using ICD-10-CM codes had comparable sensitivity and specificity but had a higher PPV. It is possible that differences in disease prevalence in the populations studied contributed to this difference. Lawrence et al 6 used information from KPSC’s EHRs to investigate the utility of ICD-9-CM codes alone or in combination with medications and laboratory values to determine T1DM and T2DM among youth with diabetes. The criterion that performed the best for T1DM was having ≥1 outpatient code for T1DM (250.x1 or 250.x3), which yielded sensitivity, specificity, PPV, accuracy, and AUC >93% and NPV of 84.2%. For T2DM, the criterion of having no outpatient T1DM diagnosis code performed the best, with sensitivity, specificity, NPV, accuracy, and AUC >92% and PPV of 81.8%. Zhong et al, used ICD-9-CM codes alone or in combination with medication use and laboratory values to determine diabetes type among youth with diabetes at two academic healthcare centers.7 8 For T1DM, the ratio of the number of T1DM billing codes to the sum of T1DM and T2DM billing codes ≥0.5 was the best criterion and was better than using counts alone (eg, ≥1 T1DM codes); sensitivity, specificity, and PPV were >92%. For T2DM, using the ratio of T2DM to the sum of T1DM and T2DM codes ≥0.4 yielded sensitivity and specificity above 87% but a PPV <70%. Adding medication and laboratory data did not improve performance.7
In a study of adults with diabetes by Klompas et al ,9 the ratio of T1DM to T2DM codes >0.5 yielded 63% sensitivity and 95% PPV for T1DM and 100% sensitivity and 90% PPV for T2DM. They reported that a set of optimized criteria that included ICD-9-CM codes, plasma C-peptide, autoantibody levels, and medications captured more persons with diabetes than a single criterion. However, in a recent external validation of the Klompas optimized criteria for T1DM, Schroeder et al found that a simpler criterion of only ICD-9-CM codes had a PPV of 96.4%, which was comparable to PPVs (range: 94.5%–96.4%) obtained using all or part of the Klompas optimized criteria.10
In our study, we found the percentage of persons with discordant codes (ie, codes for a diabetes type(s) that is different from the gold standard) to be low. However, a prior study reported that 37% of youth with T1DM in an academic healthcare system had discordant diabetes ICD-9-CM codes.7 It is possible that a single payer system within KPSC could have contributed to more accurate coding. The ability of ICD-10-CM codes to distinguish between diabetes type might be lower in external systems, but further studies are needed.
Our study has several potential limitations. We used diabetes type obtained within 6 months of diabetes diagnosis as the gold standard. Given that we have included persons who were diagnosed with diabetes on or before 2001 through 2016, physicians might have changed diabetes type after initial assessments and before our study period. However, this is estimated to affect few people. In addition, we studied members of an integrated healthcare delivery system serving Southern California, which might limit generalizability of our results to other healthcare delivery systems. Regardless, KPSC members are demographically diverse, and the study provides valuable insights about the performance of ICD-10-CM codes to determine diabetes type among persons from different race/ethnicity groups. Moreover, we did not assess the utility of using ICD-10-CM codes to identify diabetes cases as our study focused on distinguishing between diabetes types among persons known to have diabetes. In addition, while we included 95% CIs, we did not conduct formal statistical tests comparing groups because of potential limited power to detect differences in some groups with small numbers. Moreover, we were more interested in the overall ability of the ICD-10-CM codes to distinguish between T1DM and T2DM overall in a population-based cohort rather than in specific groups. Finally, our study is limited to youth-onset diabetes, and results might not apply to adult-onset diabetes, where fewer people are diagnosed with T1DM.
Our study has multiple strengths. We leveraged information from EHRs on a sizeable sample of persons enrolled in a large, managed health plan with rigorous diabetes case ascertainment and validation conducted as part of the SEARCH study protocol. We report the performance of newly implemented ICD-10-CM codes to classify T1DM and T2DM with more granularity than other studies by member characteristics such as age, race/ethnicity, BMI, and number of healthcare encounters.
We show that ICD-10-CM codes from the EHRs of a large, integrated healthcare delivery system can be used to accurately classify diabetes type, an important component of diabetes surveillance, among persons with youth-onset T1DM and T2DM. The increasing use of EHRs and the widespread availability of diagnostic codes in administrative and billing claims make ICD-10-CM codes an attractive data source for rapid and cost-efficient diabetes surveillance.
The authors would like to thank Byron Robinson, PhD, for helpful insights and comments that greatly improved the manuscript.
The SEARCH for Diabetes in Youth Study is indebted to the participants, their families, and their health-care providers, for making
this study possible.
Contributors GCC researched data, contributed to study design, and wrote the manuscript. XL conducted statistical analysis, contributed to statistical design, and researched data. SYT contributed to study design and reviewed and edited the manuscript. JMS contributed to statistical design and reviewed and edited the manuscript. CK reviewed and edited the manuscript. JML oversaw the study, researched data, contributed to study design, reviewed, and edited the manuscript.
Funding The SEARCH for Diabetes in Youth Study has been funded by the Centers for Disease Control and Prevention (awards U48/CCU919219, U01DP000246, U18DP002714, and U18DP006133) with support from the National Institutes of Diabetes and Digestive and Kidney Diseases.
Competing interests JML reports grants from the Centers for Disease Control and Prevention during the conduct of the study.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.