Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Previously Published Works bannerUC San Diego

Predicting positive Clostridioides difficile test results using large-scale longitudinal data of demographics and medication history.

Abstract

BACKGROUND: Clostridioides difficile infection is a major health threat. Healthcare institutions have strong medical and financial incentives to keep infections under control. Blanket testing at admission is in general not recommended, and current predictive models either used moderate sample sizes, over-inflated the number of covariates, or chose non-interpretable algorithms. We aim to develop models using patient data to predict positive Clostridioides difficile test results with discrimination performance, interpretable results, and a reasonable number of covariates that reflect health over a long-time span. MATERIALS AND METHODS: We processed records from 157,493 University of California San Diego Health patients seen between January 01, 2016-July 03, 2019 with at least 6 months of medication history, excluding pregnant women, patients under 18, and prisoners. Three models (Logistic Regression, Random Forest, and Ensemble) were constructed using hyper-parameters selected through 10-fold cross-validation. Model performance was measured by the Area Under the Receiver Operating Characteristic Curve (AUROC). The model coefficients odds ratios and p-values were calculated for the Logistic Regression model, as were Gini indices for Random Forest. Decision boundary analysis was conducted using pair-wise false positive and false negative cases each model would predict at a specific threshold. RESULTS: Logistic Regression, Random Forest, and Ensemble models yielded test AUROCs of 0.839, 0.851, and 0.866, respectively. Significant covariates that may affect risk include age, immuno-compromised treatments, past antibiotic uses, and some medications for the gastrointestinal tract. CONCLUSIONS: The models achieve high discrimination performance (AUROC >0.83). There is a general consensus among different analysis approaches regarding predictors that impact patients chances of having a positive test, which may influence Clostridioides difficile risk, including features clinically proven to increase susceptibility. These human-interpretable models can help distinguish significant predictors that affect a patients chance of testing positive, which may influence their Clostridioides difficile risk.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View