- Main
Exploring Multivariate Extreme Value Theory with Applications to Anomaly Detection
- Trubey, Peter
- Advisor(s): Sansó, Bruno
Abstract
Significant work has been done in the field of extreme analysis in the form of generalization of the univariate generalized Pareto distribution to a multivariate setting. We consider the constructive definition of the multivariate Pareto that factorizes a Pareto random vector into independent radial and angular components; the former following a Pareto distribution, the latter following a distribution with no closed form with support on the surface of the positive orthant of the L-infinity-norm unit hypercube. In this document, we propose a method of inferring this angular distribution, as a realization of a Bayesian non-parametric mixture of independent random gamma vectors, projected onto an arbitrary L-p-norm unit hypersphere; the support of which will approach the support of the angular component as p goes to infinity. We explore applications of this BNP mixture of projected gammas in characterizing the dependence structure of extremes; the motivating example of such we examine is the integrated vapor transport, data pertaining to an atmospheric river transporting moisture from the Pacific ocean across California. We observe clear but heterogeneous geographic dependence. Second, we consider the application of the BNP mixture of projected gammas to a novelty detection setting, developing novelty scores appropriate to the support. To expand the applicability of our methods, we develop a categorical data model, and consider the extension of the angular novelty scores to categorical, and mixed data settings. We find that our model and scores compare favorably to canonical novelty scores on canonical novelty detection datasets. Finally, we seek to understand the limitations of BNP mixture of projected gammas, by attempting to apply the model at a large scale---applied to storm surge data at specified locations, as simulated under the Sea, Lakes, and Overland Surges due to Hurricanes (SLOSH) model. We observe issues in model fidelity, in terms of recovering the marginal distributions, or capturing the dependence structure in a highly multivariate setting. We observe that as dimensionality increases, the number of extant clusters decreases. To ameliorate this loss of granularity, a regression model is proposed, that invokes a low-dimensional representation of the output space. We use these models to explore storm surge at sites of critical infrastructure in the Delaware Bay watershed.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-