Wu, Xiaoliu

Conditional Independence Test with Neural Network

2022

Wu, Xiaoliu
Advisor(s): Sharpnack, James

Abstract

This dissertation is a combination of two bodies of work in modern statistical inference. Thefirst work introduces and studies a novel conditional Independence test using neural networks. The second work is applied data analyses for the Healthy Davis Together (HDT) Program, for wastewater testing for Covid-19 infections and an analysis of the impact of the college reopening. We propose a neural Ising model to test conditional independence between binary random variables X and Y given a potentially complex random variable Z such as text or images. The method uses the score test statistic and employs a computationally efficient score-based bootstrap procedure [20] to generate the p-value. We extend the method to the multi-class X and Y by replacing the Ising model with a restricted Boltzmann machine. Empirical studies show that our model has high power against H1 and reliable type-I error control on both simulated and real-world data. We derive the asymptotic separability of the score-test statistics under the Ising model. On the applied side, we first summarize our collaboration with the wastewater team at Healthy Davis Together (HDT) initiative working on the wastewater monitoring project [36]. We provide a Bayesian Ct value imputation method via the EM-MCMC algorithm wrapped in a user-friendly API. The algorithm is able to produce Ct values matching the overall trend of the clinical data and has a stronger correlation with the clinical data when compared with existing methods [36]. The other data analysis project at HDT is measuring the impact of college reopening on the COVID19 outbreak level in their home county. The coronavirus disease 2019 (COVID-19) pandemic has dramatically impacted the 2020-2021 academic year in universities across the country, and conversely, college reopening has disrupted the course of the pandemic. We investigated COVID19 hotspot events in “college counties” which we defined as counties with at least 10% of its population composed of undergraduate students. We found that increments in cases could not be attributed to random chance by performing multiple hypothesis testing. Increments in confirmed cases among college counties from mid-August to mid-September were significantly higher than comparable non-college counties. After this period of reopening, hotspots of confirmed cases did not differ between counties, despite the college-town designation. Class setting (i.e., In-Person, Hybrid, Online) seemed to be associated with hotspot activity. We found no evidence to support an association between testing efforts and hotspots.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC Davis

Conditional Independence Test with Neural Network