Original ArticleMissing data in a multi-item instrument were best handled by multiple imputation at the item score level
Introduction
Missing data on multi-item instruments is a frequently seen problem in epidemiological and medical studies. Multi-item instruments can be used to measure, for example, quality of life, coping ability, or other psychological states. A multi-item instrument generally consists of several items that measure one construct [1], for example, the Pain Coping Inventory assesses active coping skills of people with pain complaints by 12 items [2]. Missing data on these kinds of instruments can occur as missing item scores, when several items are not completed or as missing data in total scores when the entire instrument is not filled out. Furthermore, missing item scores impair the calculation of the total score, which can lead to missing total scores as well. For missing data in item and total scores, different missing data-handling methods are available, with complete-case analysis (CCA) as the most frequently used method [3]. In general, CCA tends to perform well under the strict assumption that missing data are a completely random subsample of the data, in other words missing completely at random (MCAR) [4]. However, CCA reduces power caused by a decreased sample size. Single-imputation methods such as mean imputation of the total score and item mean imputation may be used to preserve the sample size by replacing the missing values by the mean score, but these methods reduce the variability in the data. Single stochastic regression imputation (SRI) uses observed data to predict the missing value and adds residual error to the imputed data to restore the variability in the data, but this method does not take the uncertainty of the imputed values into account.
Mostly, the probability of missing data depends on other observed variables, indicated as missing at random (MAR) [4]. In contrast to traditional methods such as CCA and mean imputation, more advanced methods such as multiple imputation (MI) produce reliable and unbiased results under the MAR mechanism and take missing data uncertainty into account [5], [6]. Both traditional and advanced methods can be applied either to the missing item scores or directly to the missing total scores.
The comparison between missing data methods for item-level and total score-level missingness in questionnaire data is seldom made in one study [3]. Other simulation studies have researched the performance of missing data methods applied to nonquestionnaire data [7], [8] or only studied methods applied to the item scores of a multi-item instrument [9], [10], [11], [12], [13]. For example, Burns et al. [13] studied the performance of MI of missing item scores but did not compare this with imputing at the total score level of their questionnaire. So far, it is still unclear if it is better to apply a missing data-handling method to the missing item scores or to the total scores when some or many items in a multi-item instrument are missing. Moreover, the impact on the study results of different missing data methods when multi-item data are missing on the covariate has not been researched extensively yet. The present study aimed to explore the performance of different missing data-handling methods designed for missing item scores and missing total scores in a multivariate regression model. This objective is considered in the following two aspects: (1) which missing data methods should be used to handle missing (item) data and (2) should this missing data method be applied to the item scores or to the total scores.
Section snippets
Simulation set up
To investigate the differences between several imputation methods, we used a simulation procedure comparable with the study performed by Marshall et al. [7]. We based our simulation on an empirical data set, which was previously used in a prospective cohort study investigating the prognosis of low back pain [14]. In this study, we used a cross-sectional part of these data that contained the multi-item variable active coping of the Pain Coping Inventory (PCI-active) [2]. The PCI-active consists
Results
In Table 2, the regression coefficient and SE estimates for the PCI-active total score under the three missing data mechanisms are presented. Not surprisingly, for the MCAR data, the coefficient estimate was the same as the true coefficient value, but the SE increased with higher missing data rates. A similar trend was seen in the MAR and MNAR missing data situations, however accompanied by much larger deviations in SEs.
Figures 1 and 2 present the effect of the missing data-handling methods on
Discussion
The results of our study are that missing item data are best handled by applying MI based on PMM or SR to the item scores regardless of how many subject scores and item scores are missing. Furthermore, single SRI also seems to yield acceptable results, and mean imputation of the total scores performs worst. Additionally, we showed that the underlying mechanism influences the performance of the missing data-handling method, especially when large amounts of data are missing. This is of concern
References (38)
- et al.
Missing covariate data in medical research: to impute is better than to ignore
J Clinical Epidemiology
(2010) - et al.
Multiple imputation was an efficient method for harmonizing the Mini-Mental State Examination with missing item-level data
J Clinical Epidemiology
(2011) - et al.
Review: a gentle introduction to imputation of missing values
J Clin Epidemiol
(2006) - et al.
Measurement in medicine
(2011) - et al.
Pain-coping strategies in chronic pain patients: psychometric characteristics of the pain-coping inventory (PCI)
Int J Behav Med
(2003) - et al.
Missing data: a systematic review of how they are reported and handled
Epidemiology
(2012) Inference and missing data
Biometrika
(1976)- et al.
Statistical analysis with missing data, Second Edition
(2002) - et al.
Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
BMC Med Res Methodol
(2010) - et al.
A comparison of inclusive and restrictive strategies in modern missing data procedures
Psychol Methods
(2001)
Item imputation without specifying scale structure
Methodology
Incidence of missing item scores in personality measurement, and simple item-score imputation
Methodology
Imputing cross-sectional missing data: comparison of common techniques
Aust NZJ Psychiatry
Missing data in multiple item scales: a Monte Carlo analysis of missing data techniques
Organizational Res Methods
The effectiveness of high-intensity versus low-intensity back schools in an occupational setting: a pragmatic randomized controlled trial
Spine
Modern applied statistics with S. Fourth Edition
A toolkit in SAS for the evaluation of multiple imputation methods
Stat Neerlandica
Influence of imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable
Multivariate Behav Res
Cited by (0)
Funding: This work was financially supported by EMGO Institute of Health and Care Research.
Conflict of interest: None.