ORIGINAL ARTICLE |
|
Year : 2020 | Volume
: 6
| Issue : 3 | Page : 194-198 |
|
Methods to Handle Incomplete Data
Vinny Johny1, Mariamma Philip2, Swathi Augustine1
1 Department of Community Medicine, Pushpagiri Institute of Medical Sciences and Research Centre, Kerala 2 Department of Biostatistics, National Institute of Mental Health and Neuro Sciences, The Tamil Nadu Dr MGR Medical University, Chennai
Correspondence Address:
Vinny Johny Biostatistician, Department of Community Medicine, College of Medicine, Pushpagiri Institute of Medical Sciences and Research Centre, Kerala, India.
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/mamcjms.mamcjms_54_20
|
|
Context: The major question for data analysis is determining the appropriate analytic approach in the presence of incomplete observations. The most common solution to handle missing data in a data set is imputation, where missing values are estimated and filled in. An important problem of imputation is to maintain the statistical significance of the data set. Aim: To compare different imputation techniques − complete case analysis, last observation carried forward (LOCF), mean imputation, hot deck Imputation, regression imputation, and multiple imputation (MI). Settings and Design: The data for the study were collected from a prospective study to find out the predictors of early response to treatment in drug naïve schizophrenia patients from a tertiary care centre, India. Methods and Material: The present study tries to compare four imputation methods: complete case analysis, LOCF, mean imputation, hot deck Imputation, regression imputation and MI, in filling up the missing values of the outcome variable. Statistical analysis used: Paired t test was used to compare the imputation methods. Results: At the fourth week, the positive and negative syndrome scale scores were missing for about a minority of the subjects (41%). Mean imputation differed significantly from LOCF (P = 0.001), regression imputation (P = 0.010) and MI (P = 0.002). LOCF differed significantly from all these methods − regression imputation (P = 0.001), hot deck imputation (P = 0.011) and MI (P = 0.001). Conclusions: LOCF and mean imputation methods are different from other imputation methods, and there is no difference between hot deck imputation, MI, and regression imputation.
|
|
|
|
[FULL TEXT] [PDF]* |
|
 |
|