ABSTRACTDuke Anh LeTran
2022
Health Informatics
Background: In March 2020, COVID-19 was declared a pandemic, and in July 2020, a unique resource became available – the University of California (UC) COVID Research Data Set (CORDS) limited dataset (LDS). This database consists of harmonized observational electronic health record (EHR) data mapped five UC academic health systems, and with the confluence of events, this thesis aims to accomplish three goals: 1) to investigate health disparities trends with respect to covid testing and positivity rates early in the pandemic; 2) to investigate EHR bias using a survival analysis of 30-day readmit based previous visit’s last white blood cell (WBC) test; and 3) to use the first two aims as frameworks to qualitatively describe the inherent informatics complexities associated with using harmonized observational EHR data.Method: For the COVID-19 testing and positivity rates with respect to race and ethnicity, a cohort of 217,339 was identified. A generalized linear mixed effects model was fitted, and odds ratios were calculated. For the 30-day readmit survival analysis, a cohort of 907 individuals were selected with a total of 2121 visits. A Cox proportional hazard regression model was fitted using WBC categorized time of day (TOD) and result value (RV) as features. Complexities associated with harmonized observational data and health informatics research were documented.
Results: 7,479 (3.4%) patients had a positive COVID-19 test. With respect to ethnicity, Hispanic/Latino were 3.2 times more likely to have a positive COVID-19 test when compared to Non-Hispanics/Latinos (estimate 1.16 standard error 0.03). With respect to race, White was used as the comparison group. Native Hawaiian/Other Pacific Islanders were 2.5 times more likely (Estimate 0.90, Standard Error 0.14), African Americans/Black were 1.7 times more likely (estimate 0.54 standard error 0.05), Other (Estimate 0.25, Standard error 0.04) and Unknown (Estimate 0.22, Standard Error 0.39) were 3 times more likely to have a positive COVID-19 test (all P-values < 0.0001). For the 30-day readmit survival analysis, the WBC categories of TOD and RV were found to not be statistically significant (p-values of 0.36 and 0.22, respectively). Informatics challenges and nuances of harmonized observational data were documented.
Conclusion: COVID testing and positivity rates between groups suggests potential health disparities trends early in the pandemic when access to testing may be more limited and warrants further investigation. Furthermore, disaggregation of race and ethnicity data is crucial for health disparities research, but current federal data standards are inadequate and pose a challenge to systematic mapping of disparate systems, specifically for race and ethnicity. Additionally, the survival analysis suggests that TOD and RV of WBC do not introduce bias when predicting 30-day readmission. Lastly, the harmonized observational component of the UC CORDS database suggests a layer of complexity due to the systematic processes involved in mapping data from disparate systems. To navigate this successfully requires cooperation between the investigator, the statistician, the domain expert, and the health informaticists; this broad range of expertise should be considered by future research teams.