- Main
New Kernel-based Methods for High-dimensional Inferences
- Song, Hoseung
- Advisor(s): Chen, Hao
Abstract
As we are entering the big data era with technological advances of data collection, high-dimensional and complex data is becoming prevalent and the development of effective analysis is gaining more attention to researchers in statistics and data science. Many approaches are usually parametric, but they are highly context specific.Kernel-based methods are widely used as a nonparametric approach and they have the potential to capture changes in the distribution. This dissertation aims to develop novel kernel-based methods for high- dimensional data on two problems: (i) two-sample testing and (ii) change-point analysis. Kernel two-sample tests have been widely used for high-dimensional data as an elegant nonparametric framework of testing equal distribution. However, existing tests based on kernel embeddings of probability distributions into reproducing kernel Hilbert spaces (RKHS) do not work well for a wide rage of alternatives when the dimension of the data is moderate to high due to the curse of dimensionality. We propose a new test statistic that makes use of patterns under high dimension and achieves substantial power improvement over existing kernel two-sample tests for general alternatives. We also propose an alternative testing procedure that maintains high power with little computational cost, offering easy off-the-shelf tools for large datasets. We also consider the testing and estimation of change-points, locations where the distribution abruptly changes in a sequence. Compared with two-sample testing problems, kernel-based methods in change- point analysis have not been well explored. We propose a new kernel-based framework that exhibits high power in detecting and estimating the location of the change-point under general alternatives. Analytic approximations to the significance of the new test statistics for both single change-point and changed-interval alternatives are derived and fast tests are proposed, offering easy off-the-shelf tools for large datasets.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-