Conclusions
Of the recommended tests, included articles investigated the reliability of the four-site monofilament,26 128 Hz tuning fork,23 24 26–29 38 39 VPT,24–26 30–34 36 37 pinprick,28 33 39 ankle reflex23 24 28 35 and proprioception.28 The findings of this review are that the inter-rater and intra-rater reliability of recommended neurological tests are largely varied when performed in people with diabetes. Based on the limited data available, results of pooled analyses suggest that VPT and ankle reflexes demonstrate acceptable reliability, whereas the reliability of pinprick and 128 Hz tuning fork tests is questionable. Additionally, cohort studies suggest that the four-site monofilament also demonstrates acceptable reliability,26 whereas reliability of proprioception may be inadequate.28 These findings should be considered in the context of the results of the QAREL assessment and the variability in methodological reporting, in conjunction with the wide CIs for the adjusted pooled estimates for the reliability (eg, the intra-rater reliability of 128 Hz tuning fork (κ=0.32 (0.05 to 0.60)) and the variability of results that indicate available evidence is low or moderate quality. Of note, although included in IDF, IWGDF and ADA guidelines, we did not identify any article reporting the reliability of the three-site monofilament, light touch, Ipswich Touch Test or temperature perception tests in people with diabetes. These results need to be considered in light of the established predictive capacity for the development of foot wounds as demonstrated by the 10 g monofilament and 128 Hz tuning fork.40 41
The findings of this systematic review highlight the need for more exhaustive investigation of reliability of recommended chairside tests for DPN. A number of these studies assessing reliability for DPN testing reported that 100% of their population cohorts had DPN23 24 27 30 34 37 39 making the weak to moderate reliability reported for both inter-rater and intra-rater reliability concerning. Although not inferring diagnostic accuracy, studies of reliability are affected by disease prevalence.42 Therefore, when conducted in a cohort all with the target disease, the results are likely to overstate the reproducibility of the measurement.42 In the case of tests such as monofilament testing for which pooled estimates of diagnostic accuracy have shown low sensitivity of 0.53 and adequate specificity of 0.88, the likelihood of a false negative test result is high for any given test point.43 This is consistent with our findings of weak to moderate test reliability even in populations consisting entirely of participants with DPN. As chairside DPN testing is both used for the diagnosis and ongoing monitoring of DPN the usefulness of a test that has limited capacity to rule out the presence of the target disease or to reproduce a positive result in those with the disease is questionable. Furthermore, given that the earliest nerve damage in DPN is likely to be to small fibers,44 reliability of chairside small-fiber tests is under investigated. We identified three studies that included investigation into the reliability of pinprick. However, we did not identify any tests investigating the reliability of thermal perception, and our present review did not investigate question-based tests such as the Total Symptom Score.12 In this context, the reliability of large-fiber tests such as monofilament and vibration perception need to be considered together with their limited ability to detect early disease. Further research is thus warranted to determine the reliability of tests capable of detecting early disease.
Methodological differences between included studies is likely to have contributed to the range of results available in the literature. Reliability of various chairside tests was reportedly affected by limited training or variances in experience levels of clinicians23 26 28 34 35 39 and also by inconsistent comprehension of individual test instructions by participants.23 24 26 32 39 Tests such as the tuning fork, monofilament and pinprick all rely on application of controlled pressure by the clinician. As the rate of pressure is difficult to control for, especially between different raters, several studies identified this as possibly influencing test reliability.23 24 26–28 38 39 These issues suggest that adequate clinician training should be undertaken, that the training is consistent with guidelines and that the instructions to patients should be clear, all of which may lead to improved reliability of chairside tests. Clinically, this can be improved through consideration of recommendations from current guidelines regarding test technique and test sites.12–14 The included literature is limited by use of small sample sizes,26 34 35 37 39 lack of blinding of assessors to previous results28 30 and heterogeneity of measures of statistical agreement used. Although the majority of studies used kappa values, some used COV, Spearman’s rho, percentage agreement or ICCs, making comparison of available data across testing methods challenging.
This review has highlighted the need for further investigation of reliability of chairside DPN testing. Due to the range of reliability and varied reliability measures across all recommended neurological tests, it is suggested that there be more extensive research into the reliability of pinprick, proprioception and other recommended chairside DPN tests that have not been investigated. Furthermore, future research should be conducted in specific populations with diabetes and be conducted in populations where prevalence of DPN has been established through testing methods with high diagnostic accuracy. Given the additional impacts of age on neurological and cognitive function beyond those results from diabetes, there may be age-specific differences in reliability of chairside tests, and as such, investigations taking age into account are required. To this end, simplifying neurological testing will allow clinicians and patients to better communicate test instructions as well as reduce the variability between clinicians when performing the tests to improve overall reliability. Furthermore, increased clinical knowledge of reliability of neurological screening tests allows for more informed clinical decision making when selecting multiple tests (eg, monofilament and tuning fork) to aid in the diagnosis and monitoring of DPN.
Although the search strategy employed in this review was designed to be robust, there may be some evidence that was not captured, for example, unpublished data. It should also be acknowledged that the reliability of chairside tests included in this review are from three international consensus statements only. Other commonly used chairside neuropathy tests that warrant further investigation include the monofilament test using additional sites for all cause peripheral neuropathy,45 conventional and graduated tuning forks,46 two-point discrimination,47 temperature sensation and the Michigan Neuropathy Screening Instrument.48 Lastly, future studies investigating test reliability should ensure adequate reporting, sufficient detail for cohort characteristics, methodology and appropriate statistical tests, for example, kappa or intraclass correlation coefficients with relevant CIs.
The results of this systematic review found evidence of acceptable reliability for VPT using a biothesiometer, neurothesiometer or maxivibrometer, ankle reflexes and the four-site monofilament test. Due to the large range of reported reliability for the 128 Hz tuning fork, we are unable to appropriately comment on this testing method. These results support the clinical use of these identified tests for screening and ongoing monitoring of DPN as recommended by the latest guidelines by IDF, IWGDF and ADA, respectively. The reliability of temperature perception (IDF and ADA), pinprick, proprioception (ADA), three-site monofilament and Ipswich touch test (IWGDF) when performed in people with diabetes remains unclear and warrants investigation to determine their suitability for use for testing in this population.