A number of competing concerns slow adoption of deep learning for computer vision on“edge” devices. Edge devices provide only limited resources for on-device algorithms to
employ, constraining power, memory, and storage usage. Examples include mobile phones,
autonomous vehicles, and virtual reality headsets, which demand both high accuracy and low
latency, two objectives competing for resources.
To tackle this sisyphean task, modern methods expend gargantuan amounts of computationto design solutions, exceeding thousands of GPU hours or years of GPU compute to design a
single neural network. Not to mention, these works maximize just one performance metric –
accuracy – under a single set of resource constraints. What if the set of resource constraints
changes? If additional performance metrics rise to the forefront, such as explainability or
generalization? Modern methods for designing efficient neural networks are handicapped by
excessive computation requirements for goals too singularly and narrowly sighted.
This thesis tackles the bottlenecks of modern methods directly, achieving state-of-the-artperformance by efficiently designing efficient deep neural networks. These improvements
don’t only reduce computation or only improve accuracy; instead, our methods improve
performance and reduce computational requirements, despite increasing search space size by
orders of magnitude. We also demonstrate missed opportunities with performance metrics
beyond accuracy, redesigning the task so that accuracy, explainability, and generalization
improve jointly, an impossibility by conventional wisdom, which suggests explainability and
accuracy participate in a zero-sum game.
This thesis culminates in a set of models that set new flexibility and performance standards forproduction-ready models: those that are state-of-the-art accurate, explainable, generalizable,
and configurable for any set of resource constraints in just CPU minutes.