Temporal data modeling plays a vital role in various research including finance, environmental science and neuroscience. Understanding and interpreting the evolutionary system behind temporal data is of interest. This work mainly emphasizes efficient statistical models on temporal data via stochastic processes. In particular, we focus on statistical modeling via two flexible random processes: Markov processes and Gaussian processes.
The first stage of the research involves a novel hidden Markov model based on Markov jump processes for cervical cancer screening test data. This model is able to model the heterogeneity of both individual and time. We provide an efficient and scalable expectation maximization based inference approach. To the best of our knowledge, our model is the first statistical model that is able to scale to a population-level dataset.
Next, we consider an alternative stochastic process that is widely applied in temporal data modeling, the Gaussian process. Motivated by the kernel perspective of inducing-point based sparse Gaussian processes, we propose a general regularization framework of sparse Gaussian processes and extend it into latent variable models. We specifically consider variational inference under our regularization framework in various settings. We theoretically demonstrate that the variational inference under our regularization can be treated by maximizing a log likelihood lower bound on a corresponding empirical Bayesian model. Our framework is illustrated in various settings throughout both synthetic and real datasets.
Building on our proposed regularization framework, we develop a hierarchical sparse latent Gaussian process model specifically for categorical data and then we extend our model to temporal data via dynamical priors. In particular, we propose efficient variational inference to make it applicable to large datasets. Moreover, our model provides a visualization way to summarize the dynamics of categorical data into a low-dimension manifold.
The fourth project is motivated by electronic health record data and spatially varying coefficient linear coregionalization model. We propose a novel nonstationary multivariate Gaussian process model that allows it to model time dependent smoothness, scale and correlation across different dimensions. One special case of our model is emphasized due to its computational efficiency. It allows an efficient inference via Kronecker algebra. Moreover, we provide both Hamiltonian Monte Carlo inference as a fully Bayesian inference and Maximum a Posteriori based inference as an approximate Bayesian inference. Our posterior inference provides a promising way to understand the relation between cross-correlation of clinical variables and health status, which contributes to early disease detection.