Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

A Probability Based Framework for Testing the Missing Data Mechanism

Abstract

Many methods exist for imputing missing data but fewer methods have been proposed to test the missing data mechanism. Little (1988) introduced a multivariate chi-square test for the missing completely at random data mechanism (MCAR) that compares observed means for each pattern with expectation-maximization (EM) estimated means. As an alternative, this manuscript proposed two new ways of testing MCAR that use estimated parameters from missingness indicators rather than moment information from observed scores. The first statistic in the probability-based (PBB) family, PBB-MCAR I, is a chi-square test of independence that tests the assumption that missingness indicators are independent among all grouping patterns. The second statistic, PBB-MCAR II, is a chi-square goodness of fit statistic that tests differences of observed versus expected probabilities conditional on ranked values of a suspect variable that drives missingness dependencies. A simulation study showed that although Little's test consistently maintained optimal Type I error rates, the empirical power of PBB-MCAR II to detect violations of MCAR was on par with Little's test under most conditions, whereas PBB-MCAR I had lower power to detect aberrations of MCAR because it tests a more restricted set of independence assumptions. These newly-developed test statistics were demonstrated in two education-based applications, a) as a way of testing the missing data mechanism when creating longitudinal trajectories of intramural sports participation among African American students, and b) as a tool to detect departures from completely at random test-taking. Future work will involve creating an R package to promote the use of these missing data tests among education researchers, extending PBB-MCAR II to incorporate auxiliary variables, and resolving the problem of sparse missing data patterns by adopting the limited information goodness of fit test proposed by Maydeu-Olivares and Joe (2005).

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View