Statistical Learning and Applications in Sparse Neural Networks
- Huang, Ke
- Advisor(s): Ma, Shujie
Abstract
Artificial intelligence technology along with deep neural networks (DNNs) have evolved rapidly in recent years, demonstrating remarkable performance in complex tasks. Despite their huge impact and general effectiveness, the theoretical foundation of DNNs is not well established. In this dissertation, we investigate the statistical properties of two sparse DNNs, convolutional neural networks (CNNs), and a novel sparse DNN named sparse deep ReLU neural network (SDRN). We consider the neural network estimators obtained from empirical risk minimization with a Lipschitz loss function and develop non-asymptotic excess risk bounds for these estimators. Our established bounds demonstrate both CNNs and SDRNs can achieve robust approximation accuracy. Additionally, these bounds provide theoretical insights into optimally aligning network complexity with sample size, ensuring the best possible learning and performance. Furthermore, we establish the theories showing that under certain assumptions, CNNs and SDRNs estimators can alleviate the \textquotedblleft curse of dimensionality \textquotedblright. These theoretical results provide important insights into the understanding of the DNNs.
To bridge the gap between theory and application, this dissertation also focuses on the practical implementation of DNNs in sleep research. Raw electroencephalogram (EEG) signals in sleep data typically exhibit class imbalance and individual heterogeneity problems, which significantly impact the classification performance of machine learning algorithms. We propose a new GAN (called EGAN) architecture adapted to the features of EEG signals for data augmentation to solve the class imbalanced problem and design a cost-free ensemble learning strategy with CNNs to alleviate the heterogeneity problem. We show that the proposed method improves classificationaccuracy compared to several existing state-of-the-art methods. This application not only showcases the practical utility of CNNs but also serves as an example of how theoretical foundations can be translated into real-world solutions.