Rapid and ongoing technology developments enable researchers to collect large scale, high-dimensional data in a wide range of areas in science and technology. The availability of these data, together with increasing computational resources, have led to the remarkable success of deep learning in a variety of tasks such as computer vision, natural language processing, and reinforcement learning. However, deep neural networks for modeling high-dimensional data can be hard to interpret and pose heavy computational burdens. One approach for addressing some of these issues is to learn sparse representations.
In my thesis, I present several approaches to facilitate efficient learning of sparse representations in deep learning. I first describe a scalable variational inference algorithm to perform variable selection in Bayesian settings using non-local priors. There, I show that our method approximates the posterior estimates with the same degree of precision in variable selection as traditional MCMC algorithms, while providing a one-order-of-magnitude speedup. Then I propose a deep auto-regressive generative model to efficiently learn sparse distributions and demonstrate its effectiveness on the problem of generating high-quality jet images in particle physics to speed up scientific discovery. Finally, I
introduce an approach which adapts transformers, the current state-of-the-art deep learning models for natural language processing tasks, to software data. There I tackle the problem of learning sparse location information to identify exception prone segments of source code in software engineering.