Title of article :
Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer
Baneshi, MR Health School - Kerman University of Medical Sciences - Deptartment of Biostatistics and Epidemiology, Kerman, , Talei, AR Shahid Faghihi Hospital - Shiraz University of Medical Sciences
Background: Multifactorial regression models are frequently used in medicine to
estimate survival rate of patients across risk groups. However, their results are not
generalisable, if in the development of models assumptions required are not
satisfied. Missing data is a common problem in pathology. The aim of this paper
is to address the danger of exclusion of cases with missing data, and to highlight
the importance of imputation of missing data before development of multifactorial
Methods: This study was performed on 310 breast cancer patients diagnosed in
Shiraz (Southern Iran). Performing a complete-case Cox regression model, a
prognostic index was calculated so as to categorise the patients into 3 risk groups.
Then, applying the Multivariate Imputation via Chained Equations (MICE) method,
missing data were imputed 10 times. Using imputed data sets, modelling was
performed to assign patients into risk groups. Estimated actuarial Overal Survival
(OS) rates corresponding to analysis of complete-case and imputed data sets
Results: Cases with at least one missing datum experienced a significantly better
survival curve. Estimates derived analysing complete-case data, relative to
imputed data sets, underestimated the OS rate in all risk groups. In addition
confidence intervals were wider indicating loss in precision due to attrition in
sample size and power.
Conclusion: Results obtained highlighted the danger of exclusion of missing data.
Imputation of missing data avoids biased estimates, increases the precision of
estimates, and improves genralisability of results to other similar populations.
Missing data , Multiple imputation , Breast neoplasm , Overall survival , Iran
Journal title :