This Monte Carlo simulation study examined the performance of the most commonly used fit indices in selecting the "correct" latent class model while varying factors such as: the true number of latent classes, the size of the latent classes (i.e., class prevalence), the nature of the latent classes, the number of indicators, and sample size. Specifically, the fit indices examined in this simulation study were the Akaike Information Criterion (AIC), the Consistent Akaike Information Criterion (CAIC), the Bayesian Information Criterion (BIC), the adjusted Bayesian Information Criterion (ABIC), the adjusted Lo-Mendell-Rubin likelihood ratio test (LMR-LRT), the parametric bootstrapped likelihood ratio test (BLRT), the approximate Bayes Factor (BF), and the correct model probability (cmP). No study to date has examined the performance of the BF and cmP in recovering the correct latent class model.
This simulation study also aimed to simultaneously examine and understand how sample size, the number of observed indicators, and class enumeration intersect in latent class analysis (LCA) models. In other words, when sampling observations from a larger population, is there a critical point where the size of the sample and the number of indicators cannot uncover all existing heterogeneity? That is, at what point is specificity of the emerging latent classes lost?
All data were generated and analyzed using Mplus latent variable software (Muthén & Muthén, 1998-2013). The specific data generation and analysis conditions in this dissertation were created based on a literature search of Education and Psychology related databases. Results from this study will help applied researchers using LCA models further understand which fit index to trust under various conditions when going through the class enumeration process in practice. Specifically, the ABIC and BLRT indices emerged as being the highest performing across a variety of conditions considered in this study. Results also highlight the practical importance of thoughtfully considering sample size and the number of indicators included when estimating and interpreting LCA models. Findings of this dissertation provide evidence for a relatively strong interplay between sample size, number of indicators, and class enumeration in LCA models.