Original ArticleAnalysis of case-cohort data: A comparison of different methods
Introduction
The case-cohort design was introduced by Prentice [1] and is useful in analyzing cohort data in which failure is rare because covariate information is collected only from all failures and a random sample (with sampling probability α) of the censored observations, referred to as the subcohort. The design is also efficient if the collection of detailed follow-up information is costly and time consuming. For example, if newly diagnosed cases are relatively easily obtained for the full cohort by contacting a disease specific registry, but follow-up for mortality or movement outside the study area (to estimate follow-up time) is more difficult, one may consider the case-cohort design. With the selection of a random subcohort, follow-up for the disease is still necessary for the full cohort but follow-up for other censoring events is restricted to members of the subcohort only. For this purpose, the design was used in the Netherlands Cohort Study [2]. In addition, the subcohort is chosen without regard of any outcome, and thus may serve as a comparison group for several different diseases. This might be efficient if biological samples are needed, which then need to be retrieved once for the comparison group. Furthermore, if applicable, DNA extraction needs to be done only once.
Three different weighting methods have been proposed, which differ in the way they handle the weighing of the subcohort members and the cases outside the subcohort [1], [3], [4]. All three methods are incorporated in an SAS macro written by Barlow and Ichikawa and made available through Statlib (http://lib.stat.cmu.edu/general/robphreg) [5]. They also compared the three methods with the nested case–control and the full-cohort analysis in a data set described by Breslow and Day [6]. This cohort, however, is very small (full-cohort size: n = 679) and includes only 56 failures. Most current cohorts used for etiologic research include far more subjects.
The purpose of this article is to compare effect estimates and standard errors (SE) yielded by the three different methods of analysis of case-cohort data with each other and with a full-cohort analysis in a large cohort. As an illustration, we investigate the relation between body mass index (BMI) and cardiovascular disease (CVD) with available cohort data. In addition, we studied the influence of the full-cohort size, subcohort size, the number of cases, and the estimated effect size (i.e., size of the relative risk) on the three methods using simulated data.
Section snippets
Weighting methods
For the analysis of case-cohort data a pseudolikelihood is used instead of the partial likelihood, which is normally used in analyzing full-cohort data [1], [5]. This pseudolikelihood is a weighted Cox regression model [5]. The contribution of a failure to the likelihood function by person i at time tj is
The first term in the denominator is the contribution by the case, weighted with weight wi. The second term is the summation over
Example
Table 2 shows the results of the example for the full-cohort (n = 15,768) and case-cohort analyses for each of the five subcohort sizes. With subcohorts larger than 1%, there was no difference between the three weighting methods. All methods showed exactly the same estimates as well as identical robust SE method.
However, only in subcohorts of 10% or larger, estimates were also comparable with those of the full-cohort analysis (i.e., BMI ≤ 23 kg/m2 and α = 10%: βfull-cohort = −0.17, SE = 0.09; βPrentice =
Discussion
In our large cohort example, three methods to analyze case-cohort data resulted in very similar effect estimates and SE. Only in the case of unrealistic extremely small subcohort sizes of 1% or less Prentice's method started to show estimates closer to the full cohort estimates than the other two methods.
Results from the simulations show again that the three methods result in identical estimates in most situations. But when (sub)cohort sizes are small, the estimates of the method proposed by
References (9)
- et al.
A large-scale prospective cohort study on diet and cancer in The Netherlands
J Clin Epidemiol
(1990) - et al.
Analysis of case-cohort designs
J Clin Epidemiol
(1999) A case-cohort design for epidemiologic cohort studies and disease prevention trials
Biometrika
(1986)- et al.
Asymptotic distribution theory and efficiency results for case-cohort studies
Ann Stat
(1988)
Cited by (103)
The SunBEAm birth cohort: Protocol design
2023, Journal of Allergy and Clinical Immunology: GlobalA Comparison of Risk Classification Systems of Colorectal Adenomas: A Case-Cohort Study
2023, GastroenterologyGenome-wide by Environment Interaction Study of Stressful Life Events and Hospital-Treated Depression in the iPSYCH2012 Sample
2022, Biological Psychiatry Global Open ScienceToward a new taxonomy of obstetrical disease: improved performance of maternal blood biomarkers for the great obstetrical syndromes when classified according to placental pathology
2022, American Journal of Obstetrics and GynecologyDevelopment and validation of a metabolite score for red meat intake: an observational cohort study and randomized controlled dietary intervention
2022, American Journal of Clinical NutritionChronic inflammatory diseases, subclinical atherosclerosis, and cardiovascular diseases: Design, objectives, and baseline characteristics of a prospective case-cohort study ‒ ELSA-Brasil
2022, ClinicsCitation Excerpt :Case-cohort studies are less costly since only a subsample of participants selected independent of the outcomes is included as the comparison group for all ancillary studies that are part of this project. Moreover, also as part of the study strategy, participants selected in the ACS have additional biological samples collected and stored at each visit, which permits the rational use of stored biological samples.21 In the baseline examination (2008‒2010), information was collected about the presence of arthritis without specification, and specifically about rheumatoid arthritis and systemic lupus erythematosus using the question: Have you been previously told by a physician that you had/have arthritis?