This dissertation presents advancements in both the theory and applications of deep learning. The objectives include reducing complexity, enhancing interpretability, and offering superior methodologies for solving signal processing problems.
On the theoretical side, we develop two major advances: one for complexity and another for interpretability. For complexity, we establish tighter upper bounds for the number of computational units required for a rectified linear unit (ReLU) neural network to represent or compute any given continuous piecewise linear (CPWL) function. Specifically, we prove that any CPWL function can be represented by a ReLU network whose number of hidden neurons is at most a quadratic function of the number of pieces of the CPWL function. This quadratic bound is independent of the input dimension and outperforms previous bounds exponentially, holding the state-of-the-art in the literature. When the number of linear components is also known, this bound reduces to a bilinear bound. On the other hand, a new upper bound on the number of pieces in terms of the number of linear components is proved, enabling different descriptions of neural complexity. In addition to existence, a polynomial-time algorithm is developed to identify a neural network that satisfies these tighter bounds, shedding light on reverse-engineering and network pruning. For interpretability, we prove that, the most popular building block in deep learning, the skip connection, can guarantee to improve representations over residual blocks when the expansion layer is sufficiently large. Its implications explain (a) why the residual network (ResNet) can avoid the degradation problem and be scaled to thousands of layers, (b) why the wide ResNet is superior than the ResNet, and (c) why the bottleneck blocks are more economical than the basic block. We design a simplified architecture called the residual nonlinear estimator (ResNEst) and propose a new architecture called the augmented ResNEst to develop guarantees for the ResNet. Under mild assumptions, it is proved that every local minimizer in the ResNEst can be a global minimizer, despite the nonconvex optimization landscape, implying that any local minimizer can be provably better than the best linear predictor.
On the application side, we develop a new deep learning-based methodology, subspace representation learning, for the classic direction-of-arrival (DoA) estimation problem in array processing. The codomain of a deep learning model is defined as a union of Grassmannians reflecting signal subspaces of different dimensions. A family of distance functions on Grassmannians is proposed. In particular, we use geodesic distances to train a model and prove that it is possible for a ReLU network to approximate signal subspaces. Because a subspace is invariant to the selection of its bases, our methodology enlarges the solution space of a model compared to existing approaches learning covariance matrices. Furthermore, due to its geometry-agnostic nature, our methodology is robust to array imperfections. Numerical results show that subspace representation learning significantly outperforms existing semidefinite programming-based and deep learning-based covariance matrix reconstruction approaches for a wide range of scenarios.