Research report
Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians’ diagnoses

https://doi.org/10.1016/S0165-0327(02)00237-9Get rights and content

Abstract

Background: The aim of this study was to compare the validity of the Hospital Anxiety and Depression Scale (HADS), the WHO (five) Well Being Index (WBI-5), the Patient Health Questionnaire (PHQ), and physicians’ recognition of depressive disorders, and to recommend specific cut-off points for clinical decision making. Methods: A total of 501 outpatients completed each of the three depression screening questionnaires and received the Structured Clinical Interview for DSM-IV (SCID) as the criterion standard. In addition, treating physicians were asked to give their psychiatric diagnoses. Criterion validity and Receiver Operating Characteristics (ROC) were determined. Areas under the curves (AUCs) were compared statistically. Results: All depression scales showed excellent internal consistencies (Cronbach’s α: 0.85–0.90). For ‘major depressive disorder’, the operating characteristics of the PHQ were significantly superior to both the HADS and the WBI-5. For ‘any depressive disorder’, the PHQ showed again the best operating characteristics but the overall difference did not reach statistical significance at the 5% level. Cut-off points that can be recommended for the screening of ‘major depressive disorder’ had sensitivities of 98% (PHQ), 94% (WBI-5), and 85% (HADS). Corresponding specificities were 80% (PHQ), 78% (WBI-5), and 76% (HADS). In contrast, physicians’ recognition of ‘major depressive disorder’ was poor (sensitivity, 40%; specificity, 87%). Limitations: Our sample may not be representative of medical outpatients, but sensitivity and specificity are independent of disorder prevalence. Conclusions: All three questionnaires performed well in depression screening, but significant differences in criterion validity existed. These results may be helpful in the selection of questionnaires and cut-off points.

Introduction

Depressive disorders are associated with high levels of personal suffering, increased disability days, and elevated risk of cardiovascular mortality and suicide (Wells et al., 1989, Broadhead et al., 1990, Ormel et al., 1994, Simon et al., 1995, Frasure-Smith et al., 2000, Penninx et al., 2001, Posternak and Miller, 2001). Unfortunately, physicians only detect 30–50% of patients with depression in primary care (Nielsen and Williams, 1980, Perez-Stable et al., 1990, Ormel et al., 1991, Docherty, 1997, Williams et al., 1999, Hansen et al., 2001). More frequently, depression goes undetected and so remains untreated (Gelenberg, 1999). Major and minor depressive disorders respond well to psychotherapy and/or treatment with anti-depressants (Miranda and Munoz, 1994, Coulehan et al., 1997, Schulberg et al., 1998, Whooley and Simon, 2000, Williams et al., 2000, Jarrett et al., 2001), thus emphasising the need to improve recognition by clinicians. Recently, it has been demonstrated that screening for depression can be cost-effective if screening costs are low and effective treatments are given (Valenstein et al., 2001). Screening questionnaires that guarantee low screening costs are entirely self-administered, and only require a couple of minutes for patients to complete and physicians to review. International and well-established screening questionnaires that meet these requirements are the Hospital Anxiety and Depression Scale (HADS; Zigmond and Snaith, 1983), the WHO (five) Well Being Index (WBI-5; WHO, 1998a), and the Patient Health Questionnaire (PHQ; Spitzer et al., 1999). Of interest to clinicians and researchers is knowing which of the available screening instruments can be recommended for clinical use, the validity of the results, and their superiority to recognition by physicians working with medical outpatients. In addition, users of screening questionnaires need to know optimal cut-off points for detecting depressive disorders according to DSM-IV (American Psychiatric Association, 2000).

The purpose of this study was to determine the comparative validity of the Hospital Anxiety and Depression Scale (HADS), the WHO (five) Well Being Index (WBI-5), the Patient Health Questionnaire (PHQ), and physicians’ recognition of depressive disorders. Specifically, this study aimed:

  • (1)

    to investigate internal consistency and intercorrelations of the three depression scales;

  • (2)

    to analyse the operating characteristics of the depression scales and physicians’ diagnoses according to an independent criterion standard for depressive disorders;

  • (3)

    to determine if any one screening instrument is superior to the others in diagnosing DSM-IV depressive disorders;

  • (4)

    to determine optimal cut-off points for discriminating between subjects with and without depressive disorders.

Section snippets

Subjects

The study was performed in the outpatient clinics of Heidelberg Medical Hospital and 12 family practices in Heidelberg from August 2000 to July 2001. On predetermined days, patients visiting these sites were asked to participate in our study and to complete a set of questionnaires during their waiting time. With the aim of performing 500 Structured Clinical Interviews for DSM-IV (SCID; First et al., 1995, Wittchen et al., 1997) as the criterion standard for the presence of depressive disorders,

Internal consistency and intercorrelations

The internal consistency of all three depression scales was excellent: Cronbach’s α for the PHQ was 0.88; the HADS, 0.86; and the WBI-5, 0.91. The substantial intercorrelations of 0.74 (HADS×PHQ), −0.73 (WBI-5×PHQ), and −0.76 (HADS×WBI-5) demonstrate that the three scales measure nearly the same construct.

Comparative validity for ‘major depressive disorder’

Table 1 shows the operating characteristics of the depression scales and the physicians’ diagnoses for ‘major depressive disorder’ for three potential cut-off points for each instrument, and

Discussion

The main purpose of our study was to investigate the criterion validity of three international screening instruments for depression, and to determine whether they differ significantly regarding their ability to diagnose DSM-IV depressive disorders. Previous comprehensive reviews (Meakin, 1992, Mulrow et al., 1995) have demonstrated reasonable operating characteristics for several case-finding instruments for depression, but significant differences between instruments remain elusive. To our

Acknowledgments

This study was supported by unrestricted research grants from Pfizer, Germany, and from the medical faculty of the University of Heidelberg, Germany (project 121/2000), and there are no conflicts of interest. First of all, we thank our patients and their doctors, who collaborated to this study and made this work possible. We are very grateful to our students Levke Willand and Ingeborg Warnke, who played an important role in data collection. Susanne Geercken, MA, Pfizer, reviewed the German

References (57)

  • P. Bech et al.

    The WHO (Ten) Well-Being Index: validation in diabetes

    Psychother. Psychosom.

    (1996)
  • B. Bracken et al.

    State of the art procedures for translating, validating and using psychoeducational tests in cross-cultural assessment

    School Psychol. Int.

    (1991)
  • W.E. Broadhead et al.

    Depression, disability days, and days lost from work in a prospective epidemiologic survey

    J. Am. Med. Assoc.

    (1990)
  • J.L. Coulehan et al.

    Treating depressed primary care patients improves their physical, mental, and social functioning

    Arch. Int. Med.

    (1997)
  • E.R. De Long et al.

    Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach

    Biometrics

    (1988)
  • C. Diez-Quevedo et al.

    Validation and utility of the Patient Health Questionnaire in diagnosing mental disorders in 1003 general hospital spanish inpatients

    Psychosom. Med.

    (2001)
  • J.P. Docherty

    Barriers to the diagnosis of depression in primary care

    J. Clin. Psychiatry

    (1997)
  • M.B. First et al.
  • J.L. Fleiss

    Measuring nominal scale agreement among many raters

    Psychol. Bull.

    (1971)
  • N. Frasure-Smith et al.

    Social support, depression, and mortality during the first year after myocardial infarction

    Circulation

    (2000)
  • A. Gelenberg

    Depression is still underrecognized and undertreated

    Arch. Intern. Med.

    (1999)
  • C. Herrmann et al.
  • R. Heun et al.

    Internal and external validity of the WHO Well-Being Scale in the elderly general population

    Acta Psychiatr. Scand.

    (1999)
  • R.B. Jarrett et al.

    Preventing recurrent depression using cognitive therapy with and without a continuation phase: a randomized clinical trial

    Arch. Gen. Psychiatry

    (2001)
  • J.G. Johnson et al.

    Health problems, impairment and illnesses associated with bulimia nervosa and binge eating disorder among primary care and obstetric gynaecology patients

    Psychol. Med.

    (2001)
  • H.C. Kraemer et al.

    Measuring the potency of risk factors for clinical or policy significance

    Psychol. Methods

    (1999)
  • K. Kroenke

    Depression screening is not enough

    Ann. Intern. Med.

    (2001)
  • K. Kroenke et al.

    Similar effectiveness of paroxetine, fluoxetine, and sertraline in primary care: a randomized trial

    J. Am. Med. Assoc.

    (2001)
  • Cited by (839)

    View all citing articles on Scopus
    View full text