Introduction

A genetic diagnosis of monogenic diabetes enables some patients to stop insulin treatment. Patients with MODY caused by mutations in the transcription factor genes HNF1A and HNF4A are sensitive to low-dose sulfonylureas [1] and those with GCK mutations do not require pharmacological treatment. Most patients with neonatal diabetes caused by mutations in the KCNJ11 or ABCC8 genes encoding the K-ATP channel subunits Kir6.2 or SUR1 achieve improved glycaemic control with high-dose sulfonylureas [2].

Current diagnostic testing uses Sanger sequencing as the gold standard to detect base substitutions and small indels (insertions or deletions). An additional assay, often multiplex ligation-dependent probe amplification (MLPA), is required for the identification of partial/whole gene deletions or duplications [3]. Genetic testing is usually restricted to a small subset of genes according to the patient’s phenotype [4].

Next-generation sequencing technology provides the potential for simultaneous analysis of all the known disease genes in a single assay at a similar cost to testing a few genes by Sanger sequencing. Targeted assays for gene panels ranging from two to 105 genes have been developed for polycystic kidney disease [5], Bardet–Biedl/Alström syndrome [6] and retinal disease [7]. We developed a targeted next-generation sequencing assay to identify mutations causing monogenic diabetes and tested it in a cohort of patients in whom previous testing for MODY or neonatal diabetes had failed to confirm a genetic diagnosis.

Methods

We designed a custom Agilent SureSelect exon-capture assay (Agilent Technologies, Santa Clara, CA, USA) with baits for 29 genes (see electronic supplementary material [ESM] Methods). These included 13 known/putative MODY genes (GCK, HNF1A, HNF4A, HNF1B, NEUROD1, INS, CEL, PDX1, PAX4, BLK, KLF11, KCNJ11 and ABCC8), two genes where mutations cause diabetes through lipodystrophy (LMNA and PPARG), the m.3243 region of the mitochondrial genome (where the m.3243A>G mutation causes MIDD) and 20 neonatal diabetes genes (GCK, KCNJ11, ABCC8, INS, PDX1, PTF1A, HNF1B, NEUROD1, NEUROG3, RFX6, EIF2AK3, FOXP3, GLIS3, SLC19A2, SLC2A2, IER3IP1, ZFP57, WFS1, GATA6 and GATA4). Bait density and replication were adjusted using coverage data from exome sequencing samples, captured with the Agilent SureSelect Human All Exon v1 (38 Mb) system (see ESM Fig. 1), to achieve greater uniformity of capture across the 66.8 kb target.

DNA samples from a total of 114 patients were tested—32 with known mutations identified by Sanger sequencing, MLPA dosage or array CGH (comparative genomic hybridisation) and 82 previously tested for MODY (n = 33, diagnosed at age <35 years, BMI <30 kg/m2, who had undergone previous testing for GCK, HNF1A and/or HNF4A) or neonatal diabetes diagnosed before 6 months (n = 49, all previously tested for mutations in at least KCNJ11, ABCC8 and INS) but in whom a mutation had not been found. Study participants gave informed consent and these investigations were carried out in accordance with the Declaration of Helsinki as revised in 2000.

Samples were fragmented using a Bioruptor (Diagenode, Liège, Belgium), indexed for multiplexing and hybridised (in pools of 12 samples) according to the manufacturer’s instructions. Sequencing was performed with an Illumina HiSeq 2000 (Illumina, San Diego, CA, USA) (48 samples per lane) and 100 bp paired end reads. Data were processed as described previously [8] to identify potential pathogenic mutations located within 50 bp upstream and 10 bp downstream of each exon. Deletions/duplications >30 bp were identified by relative read depth coverage. All newly identified mutations were confirmed by Sanger sequencing.

Results

The targeted next-generation sequencing assay ‘captured’ the protein coding regions and conserved splice sites of the 28 monogenic diabetes genes and the m.3243 region of the mitochondrial genome from the patients’ DNA samples by hybridisation. These DNA fragments were then amplified and sequenced on an Illumina HiSeq 2000 to generate multiple reads per base. The average read depth across the targeted gene regions in the 114 samples was 257 per base (SD ±85) with ≥30 reads for 97.5% of bases and ≥20 reads for 98.0% of bases. For the 20 genes where testing is currently available in our laboratory by Sanger sequencing, the average depth of coverage was 272 and 99.4% of bases had a minimum read depth of 30 (Fig. 1a). One specific region of low coverage was observed across a ∼300 bp GC-rich region of GATA6 exon 2 (Fig. 1b).

Fig. 1
figure 1

Depth of coverage for targeted genes. (a) The percentage of targeted bases sequenced at a minimum depth of 2, 10, 20 and 30 reads per base for regions of interest for all 28 MODY/neonatal diabetes genes and the m.3243 base (black bars) and for the 20 genes routinely tested by Sanger sequencing in our laboratory (white bars). Data are mean values for the cohort of 114 samples; error bars show 1 SD from the mean. (b) Region of low coverage (<30 reads per base) within GATA6 exon 2. The graph shows the average depth of coverage (black line) in the region chr18:19,751,740–19,752,040 (hg19 coordinates); grey shading indicates 1 SD from the mean. Average coverage depth over the entire GATA6 exon 2 region of interest (chr18:19,751,056–19,752,250) was 126

We identified all 34 mutations in the 32 positive control samples (Table 1). These included 19 base substitutions, ten small insertions or deletions (≤27 bases) and five partial/whole gene deletions/duplications (see ESM Fig. 2). The mosaic GATA6 c.1303-1G>T mutation was present in 142/598 reads (24%). In these samples a total of 36 different polymorphisms previously identified by Sanger sequencing were confirmed. There were no false-positive variant calls. The sensitivity and specificity for variant identification was 100% (Clopper–Pearson 95% CI 94.9, 100).

Table 1 Mutations identified by targeted next-generation sequencinga

Previously unidentified mutations were found in 5/33 patients (15%) referred for MODY testing (Table 1). A mitochondrial m.3243A>G mutation was found in two patients in whom HNF1A testing had been requested and confirmed a diagnosis of MIDD. Mutations were identified in the GCK, HNF1B and HNF4A genes in three patients who had previously been tested for HNF1A and HNF4A. The intronic HNF4A mutation, located five bases from the exon (c.358+5G>A), was not detected by genetic testing in 1999 because analysis was restricted to the exons and conserved splice sites (±2 bp). This mutation is predicted to reduce the splicing efficiency of the intron 3 splice donor site and was detected in four additional diabetic relatives.

Mutations were found in 9/49 patients (18%) with neonatal diabetes (Table 1). For eight of these nine patients the mutated gene had not been analysed previously. In three cases the mutations were in genes (EIF2AK3 or SLC19A2) where neonatal diabetes is usually part of a syndrome but the initial testing was performed soon after the diagnosis of diabetes, before other features had developed. Novel mutations were found in GCK, PDX1 and GATA6 in five patients. The p.R826W ABCC8 mutation had not been detected previously due to allelic dropout caused by a polymorphism (rs139233603) within a primer binding site that was not listed on variant databases at the time of the Sanger sequencing analysis. This patient and her affected sibling were compound heterozygotes for a previously detected inactivating ABCC8 mutation (c.580-1G>A). Both have now transferred from insulin to sulfonylurea therapy.

Discussion

This novel targeted next-generation sequencing capture assay provides a sensitive method for simultaneous testing of mutations in 29 known/putative monogenic diabetes genes. The types of mutations identified include base substitutions, small indels and large deletions/duplications. A genetic diagnosis was obtained for 14/82 (17%) patients in whom testing had previously been limited to a subset of these genes.

Six of the 14 newly identified mutations were in genes that had not been tested previously because extrapancreatic features characteristic of the genetic subtype were not present (e.g. no known renal disease in a patient with an HNF1B mutation as previously described by Edghill et al [9]), had not yet presented (e.g. skeletal dysplasia in Wolcott Rallison syndrome or deafness and megaloblastic anaemia in TRMA syndrome) or were not noted at referral for genetic testing (e.g. deafness in MIDD). Two mutations were in genes where previous testing had yielded a false-negative result because the mutation was located outside the region of analysis (HNF4A c.358+5G>A) or was not detected due to allelic dropout at the PCR stage (ABCC8 p.R826W). For the two siblings with the ABCC8 p.R826W mutation the test result allowed them to stop their insulin therapy and replace it with sulfonylurea tablets.

The error rate for next-generation sequencing is estimated to be 1% [10] and therefore multiple reads are required to obtain equivalent sensitivity to Sanger sequencing. We used a bait balancing strategy (increased number of baits within regions of low coverage predicted from exome sequencing using the same SureSelect capture system) to achieve the recommended minimum of 30 reads/base for clinical diagnostic testing for 99.4% for the 20 genes currently tested by Sanger sequencing (97.5% bases with ≥30 reads for all 29 genes). Only one GC-rich region of GATA6 (<0.3 kb out of the total 66.8 kb sequenced) proved difficult to capture and may require supplementary Sanger sequencing. The high, even coverage enabled us to detect all 70 unique mutations or polymorphisms in the positive controls. Although the assay showed 100% sensitivity and specificity, validation studies of further variants are required to reduce the CIs.

Methods for detecting large deletions or duplications (also known as copy number variants) in targeted next-generation sequencing data have lagged behind calling of base substitutions and small indels. Difficulties arise from the low and variable coverage in many other targeted assays. Our high, even coverage enabled the detection of two multi-exonic duplications and three single or multi-exon deletions by cross-sample normalisation and comparison. If further studies confirm the sensitivity of this approach it might be possible to replace both Sanger sequencing and dosage analysis by MLPA with a single targeted next-generation sequencing assay for monogenic diabetes.

Our study suggests that the implementation of targeted next-generation sequencing for clinical diagnostic testing will increase the number of patients with confirmed monogenic diabetes. A genetic diagnosis is important since it defines the diagnostic subtype, determines the most appropriate treatment and informs the sibling recurrence risk or risk of diabetes in offspring. This targeted next-generation sequencing assay may also prove to be a useful pre-screen before exome or genome sequencing for disease gene discovery.