Groundwater plays a crucial role in our water resources. To counteract the growing demand and depletion of groundwater resources and devise sustainable management plans, a wide range of models have been applied to make estimates/predictions of various hydrogeologic responses. However, uncertainty arises in all modeling applications, and the quantification/reduction of modeling uncertainty has been a challenge for all hydrogeologists, especially at unsampled locations.
The main challenge posed at unsampled locations is the lack of in-situ data, forcing us to search for alternative sources of information, and systematically assimilate the said information in order to obtain conditioned estimates of hydrologic responses.
To that end, the primary objective of this dissertation is the advancement of stochastic modeling approaches targeting at unsampled locations.
Under the context of the primary objective, we propose three different stochastic modeling approaches that are designed to assimilate three alternative sources of information, respectively.
First, we propose the Rapid Impact Modeling (RIM) approach to efficiently assimilate in-situ soft data (i.e., in-situ data that are related to the target response via transfer functions) for obtaining conditioned estimates. RIM improves upon the existing approximate Bayesian computation approaches by (1) bypassing the estimation of posterior distributions of model parameters, thus reducing the computation burden, and (2) relaxing the need to reduce data into summary statistics, thus avoiding losing information.
To demonstrate the power of RIM, we address the challenge of data scarcity against the backdrop of a 7 $km$ long and hundreds of meters deep underground tunnel in China, a typical example of heavy-impact yet poorly sampled site. Through the demonstration, we also recognize that goal-oriented site characterization is in many cases more useful in applications compared to parameter-oriented characterization.
Second, we turn our attention to the assimilation of ex-situ data (i.e., data from locations other than the location of interest) via regionalization, transferring information obtained at sampled locations to unsampled ones. The reliability of regionalization depends on (1) the underlying system of hydrologic similarity, as well as (2) the approach by which information is transferred.
We propose a nested structure to couple classification tree with Bayesian additive regression tree, named the nested tree-based modeling approach. The nested tree-based modeling approach is designed as an advanced regionalization technique that features the capability of modeling non-linear predictor-response relationship, as well as Bayesian representation of uncertainties of the model parameters and the model structure.
In addition, we integrate the approach with a hypothesis of two-leveled hierarchical hydrologic similarity to investigate the dynamic behavior of hydrologic similarity. In a case study of groundwater recharge estimation, we show how the nested tree-based modeling approach and the hierarchical similarity hypothesis can reveal the variation of the controls of hydrologic similarity under different conditions. The proposal of the nested tree-based modeling approach and our hypothesis of hierarchical similarity contribute to the understanding of the physical principles governing robust information transfer.
Third, we look at situations with extreme data scarcity where in-situ data are unavailable, and the ex-situ data takes the form of bounds of plausible value rather than point observations. We propose a nuanced two-level Bayesian hierarchical model to assimilate ex-situ bounds, where ex-situ bounds are assimilated via truncation of distributions rather than data imputation, thus avoiding artificial biases. Furthermore, our approach features the capability of modeling ex-situ bounds as random variables to account for the potential uncertainties of ex-situ bounds. Our proposed approach not only contributes to the Bayesian regionalization using ex-situ bounds but also provides guidance for future applications in the establishment of ex-situ bounds.
The three approaches are all based on the concepts of Bayes' rule and can all be considered as applications of Bayesian inference. They represent sophisticated assimilation of various alternative forms of information, and are designed to tackle the ultimate challenge of large modeling uncertainty in the face of data scarcity. We expect the approaches proposed in this dissertation to contribute to the advancement of Bayesian uncertainty quantification and reduction at unsampled locations.